GSoC 2015: Raspberry Pi 2 Support

Rohini Kulkarni krohini1593 at gmail.com
Sun Jun 21 18:59:57 UTC 2015


Hi all,

I have managed to get a significant performance improvement with some
changes in configurations.

The measured time was for dhrystones reduced from 12 to "too small to be
measured "

For dhrystones the time was 0.4.

The number of dhrystones per second increased from approximately 83333 to
2500000 :)

Thanks!

On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni <krohini1593 at gmail.com>
wrote:

> Hi,
>
> I have added an SMP related post to my blog to define where exactly in the
> code I need to work. Some feedback to indicate if I am identifying the work
> area correctly would be very helpful!
>
> Thanks!
>  On 18 Jun 2015 03:37, "Rohini Kulkarni" <krohini1593 at gmail.com> wrote:
>
>> Hi all,
>>
>> I have updated my blog to reflect my understanding and attempts for cache
>> performance issue.
>>
>> Lately I have been trying around memory attributes for the
>> mm_config_table. One set of configurations for cacheable memory (inner and
>> outer levels)ended up reducing performance further ( which I really thought
>> would improve). So this table set up certainly controls performance.
>>
>> The results are not improving after turning on cache. So memory sections
>> are perhaps not even getting cached.
>> I get a feeling it has got to do with this mm_config_table.
>>
>> Updates from the github code and blog might help in further discussion.
>>
>> Link to github code:https://github.com/krohini1593/rtems/tree/rohini
>>
>> Link to Blog <http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html>
>>
>> Thanks!
>>
>> On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudmore at gmail.com>
>> wrote:
>>
>>> Hi,
>>> Some of the code examples may give you some clues. Like this one:
>>> https://github.com/mrvn/test/blob/master/smp.cc
>>>
>>> Or this:
>>> https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT
>>>
>>> If you still can't figure it out, you can always join the
>>> raspberrypi.org forums and ask on this thread:
>>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>>>
>>> When it comes to the Pi 2 and SMP, you are our RTEMS expert :)
>>>
>>> Thanks,
>>> Alan
>>>
>>>
>>> On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni <krohini1593 at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> This is regarding Pi 2 SMP support. After powering on, the secondary
>>>> mailboxes read one of their four mailbox registers and wait for a non-zero
>>>> content to be written. This content is to be the physical address of the
>>>> location from where the cores are expected to start execution.
>>>>
>>>> I am stuck at figuring out this address. How should I go about
>>>> understanding this?
>>>>
>>>> Thanks!
>>>> On 3 Jun 2015 19:44, "Gedare Bloom" <gedare at gwu.edu> wrote:
>>>>
>>>>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni <krohini1593 at gmail.com>
>>>>> wrote:
>>>>> > But, I can't say cache configurations have a role here.
>>>>> >
>>>>> > I'll push my code to my github project soon.
>>>>> >
>>>>> > P.S. The Pi2 board I possess seems to have broken down. It just isn't
>>>>> > turning on. Unable to test further. Will order one immediately.
>>>>> >
>>>>> Ouch. Make sure you put it in a safe space for development, clear of
>>>>> threats like moisture, static shock, and cats.
>>>>>
>>>>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> Alan, your suggestion has resulted in much improvement
>>>>> >>
>>>>> >> arm_control=0x1000
>>>>> >>
>>>>> >> This has simply worked! Looks like the other cores were taking up
>>>>> plenty
>>>>> >> of time.
>>>>> >> I was aware from references that the other cores run a WFI, but ya,
>>>>> did
>>>>> >> not get its impact.
>>>>> >> Time for each dhrystone has reduced to 7 from 13 and the no of
>>>>> dhrystones
>>>>> >> per second also increased.
>>>>> >>
>>>>> >> But this is a change only in the config.txt not actually in the
>>>>> boot code.
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> Rohini
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore <
>>>>> alan.cudmore at gmail.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> The caches are being enabled on the RPI 1 BSP. The same code is
>>>>> being
>>>>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for
>>>>> the cache
>>>>> >>> setup.
>>>>> >>> I have been reading through this long thread, and it is very
>>>>> informative:
>>>>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>>>>> >>>
>>>>> >>> I am starting to understand the setup that is required to enable
>>>>> caches
>>>>> >>> on the RPI 2. For example this message near the bottom of page 3
>>>>> gives a
>>>>> >>> good indication of the speedup available by configuring the MMU
>>>>> and caches
>>>>> >>> correctly:
>>>>> >>> Quote from above thread
>>>>> >>> ------------------------------
>>>>> >>> Enabling I/D caches and branch prediction, just like the julia
>>>>> demo uses,
>>>>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a
>>>>> much smaller
>>>>> >>> loop than the julia demo has.
>>>>> >>>
>>>>> >>> Enabling the MMU and mapping memory inner/outer write-back, write
>>>>> >>> allocate and the framebuffer inner write-through, no write
>>>>> allocate + outer
>>>>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps.
>>>>> >>>
>>>>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
>>>>> cache
>>>>> >>> effect.
>>>>> >>> -------------------------
>>>>> >>> End of quote
>>>>> >>>
>>>>> >>> The person who posted the above comment (mrvn) posted the code
>>>>> here:
>>>>> >>> https://github.com/mrvn/test/blob/master/mmu.cc
>>>>> >>>
>>>>> >>>
>>>>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in
>>>>> a wait
>>>>> >>> loop always accessing the bus. By putting this option in the
>>>>> config.txt file
>>>>> >>> you can put the other cores to sleep, speeding up the code on core
>>>>> 1.
>>>>> >>>  arm_control=0x1000
>>>>> >>> It would be worth trying that option to see if the benchmark
>>>>> speeds up.
>>>>> >>>
>>>>> >>>
>>>>> >>> Alan
>>>>> >>>
>>>>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary <
>>>>> heshamelmatary at gmail.com>
>>>>> >>> wrote:
>>>>> >>>
>>>>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni <
>>>>> krohini1593 at gmail.com>
>>>>> >>> wrote:
>>>>> >>>
>>>>> >>> From what I saw, they have to be enabled separately. Cache/mmu are
>>>>> >>> disabled
>>>>> >>> upon reset.
>>>>> >>>
>>>>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache
>>>>> init,
>>>>> >>> however I don't know about Pi2 and where its code is.
>>>>> >>>
>>>>> >>> [1]
>>>>> >>>
>>>>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
>>>>> >>>
>>>>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmatary at gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> Aren't the MMU/Caches enabled by default for RPi [1]?
>>>>> >>>
>>>>> >>> [1]
>>>>> >>>
>>>>> >>>
>>>>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c
>>>>> >>>
>>>>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
>>>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni <
>>>>> krohini1593 at gmail.com>
>>>>> >>> wrote:
>>>>> >>>
>>>>> >>> Dr. Joel,
>>>>> >>>
>>>>> >>> So we can't say something solely on the basis of this result?
>>>>> >>>
>>>>> >>>
>>>>> >>> I don't think so. If Linux performs the same, then what you did is
>>>>> as
>>>>> >>> good as it gets.
>>>>> >>>
>>>>> >>> However, if Linux is faster then some setting still isn't right.
>>>>> >>>
>>>>> >>> You need a reference measurement to have any confidence. It is
>>>>> possible
>>>>> >>> you did something but didn't actually turn the cache (or all the
>>>>> cache)
>>>>> >>> on.
>>>>> >>>
>>>>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> I have not run it under linux on pi2 yet. Will have to run and
>>>>> check
>>>>> >>> the result.
>>>>> >>>
>>>>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherrill at oarcorp.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni <
>>>>> krohini1593 at gmail.com>
>>>>> >>> wrote:
>>>>> >>>
>>>>> >>> HI,
>>>>> >>>
>>>>> >>> I tried running the dhrystone benchmark with some changes for
>>>>> >>>
>>>>> >>> cache/mmu
>>>>> >>>
>>>>> >>> set up.
>>>>> >>>
>>>>> >>> However, the output shows a reduction in performance.
>>>>> >>> The time to run through the dhrystone has increased from 12 to 13
>>>>> and
>>>>> >>> dhrystones run per second decreased.
>>>>> >>>
>>>>> >>> According to this result, things were better with caches disabled.
>>>>> >>>
>>>>> >>>
>>>>> >>> I have been working on this since two days and could not figure
>>>>> out an
>>>>> >>> improvement. Any pointers?
>>>>> >>>
>>>>> >>>
>>>>> >>> How did it do under Linux on the Pi2?
>>>>> >>>
>>>>> >>>
>>>>> >>> Thanks.
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
>>>>> >>> <krohini1593 at gmail.com> wrote:
>>>>> >>>
>>>>> >>> Hi All,
>>>>> >>>
>>>>> >>> I have to implement the cache coherency support for Cortex A7. But
>>>>> for
>>>>> >>> A7 MPCore, unlike for A9, I am not able to find any register
>>>>> >>> description for the Snoop Control Unit from the TRM.
>>>>> >>>
>>>>> >>> I need help here on how to proceed.
>>>>> >>>
>>>>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary
>>>>> >>> Control Register which enables cache broadcast operations. The
>>>>> >>>
>>>>> >>> register
>>>>> >>>
>>>>> >>> format is different for A7 and again I am unable to find how to
>>>>> >>>
>>>>> >>> achieve
>>>>> >>>
>>>>> >>> the same for A7.
>>>>> >>>
>>>>> >>> Thanks!
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
>>>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I am working with the code for bsp hooks. I am referring to
>>>>> existing
>>>>> >>> ARM multicore bsp codes, zync mainly.
>>>>> >>>
>>>>> >>> 1. There are existing hooks for the raspberry pi. Where should the
>>>>> >>>
>>>>> >>> code
>>>>> >>>
>>>>> >>> for the  Pi2 hooks be added?
>>>>> >>>
>>>>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed
>>>>> inside
>>>>> >>> the Pi BSP directory.
>>>>> >>> There is already a Pi2 variant of that code built. But we know
>>>>> >>>
>>>>> >>> specific
>>>>> >>>
>>>>> >>> places where there
>>>>> >>> are variances. Depending on the scope of what is different, it can
>>>>> be
>>>>> >>> as simple as
>>>>> >>> a cpp conditional in a .h to select a value or two implementations
>>>>> of
>>>>> >>>
>>>>> >>> a
>>>>> >>>
>>>>> >>> single method
>>>>> >>> and the Makefile.am picking the right file to build based on the
>>>>> board
>>>>> >>> variant.
>>>>> >>>
>>>>> >>> The big question to always ask is: Is this specific to the Pi2 and
>>>>> >>> incompatible with the Pi?
>>>>> >>>
>>>>> >>> Since the Pi BSP is still missing capabilities, it is likely code
>>>>> >>> common to both will
>>>>> >>> be added this summer. For example, did the mailbox interface
>>>>> change? I
>>>>> >>> don't know
>>>>> >>> but would guess that it didn't.  Each new capability added needs
>>>>> that
>>>>> >>> added.
>>>>> >>>
>>>>> >>> And any differences need to be analyzed to pick the least intrusive
>>>>> >>>
>>>>> >>> way
>>>>> >>>
>>>>> >>> to provide
>>>>> >>> alternate implementations. Or enable special code like the Pi2 SMP
>>>>> >>> support which
>>>>> >>> is dependent on --enable-smp and being a Pi2.
>>>>> >>>
>>>>> >>> 2. Am I right in understanding that I will have to implement A7
>>>>> >>> specific functions as have been for A9? I am referring
>>>>> specifically to
>>>>> >>> the arm-a9mpcore-start.h
>>>>> >>>
>>>>> >>> Yes.
>>>>> >>>
>>>>> >>> If the code is very similar between the a7 and a9, then a
>>>>> discussion
>>>>> >>> on devel@ should occur to decide the best way to minimize
>>>>> duplication.
>>>>> >>>
>>>>> >>> If you end up with a7 specific code, you should follow the location
>>>>> >>>
>>>>> >>> and
>>>>> >>>
>>>>> >>>
>>>>> >>> naming patterns already established. That places it in
>>>>> >>> libbsp/arm/shared/...
>>>>> >>> so it can be used by any BSP with the right SMP core.
>>>>> >>>
>>>>> >>>
>>>>> >>> I am referring to existing codes to locate and get hold of what
>>>>> needs
>>>>> >>> to be done in the hooks. However, being new to such
>>>>> implementations, I
>>>>> >>> am taking longer to understand the details. Any suggestions that
>>>>> might
>>>>> >>> help here are welcome
>>>>> >>>
>>>>> >>> The answer will depend on the factors listed above. When code can
>>>>> >>> be shared, we want to share it across as many BSPs as makes sense.
>>>>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2),
>>>>> then
>>>>> >>> you want to find the way to account for the variation in the least
>>>>> >>> intrusive code way possible.
>>>>> >>>
>>>>> >>> Thanks!
>>>>> >>>
>>>>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> Excited to be a part of  this edition of GSoC! Thanks to everybody
>>>>> for
>>>>> >>> helping me get here and congratulations to all the participating
>>>>> >>> students!
>>>>> >>>
>>>>> >>> So, now getting to work, firstly I wish to know, specifically from
>>>>> my
>>>>> >>> mentors, any changes that must be made to my proposed project or
>>>>> >>> schedule.
>>>>> >>>
>>>>> >>> Secondly, are there any specifics for the development blog that we
>>>>> >>>
>>>>> >>> need
>>>>> >>>
>>>>> >>> to create for the project? Over time what is the blog expected to
>>>>> >>> convey.
>>>>> >>>
>>>>> >>> Also, I have to create a new wiki page for my project as none
>>>>> exists.
>>>>> >>>
>>>>> >>> I
>>>>> >>>
>>>>> >>> want to know how to add one.
>>>>> >>>
>>>>> >>> --
>>>>> >>>
>>>>> >>> Rohini Kulkarni
>>>>> >>>
>>>>> >>>
>>>>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development
>>>>> >>> joel.sherrill at OARcorp.com On-Line Applications Research Ask me
>>>>> about
>>>>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256)
>>>>> >>>
>>>>> >>> 722-9985
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>>
>>>>> >>> Rohini Kulkarni
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>>
>>>>> >>> Rohini Kulkarni
>>>>> >>>
>>>>> >>>
>>>>> >>> --joel
>>>>> >>>
>>>>> >>>
>>>>> >>> --joel
>>>>> >>> _______________________________________________
>>>>> >>> devel mailing list
>>>>> >>> devel at rtems.org
>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Hesham
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> Hesham
>>>>> >>> _______________________________________________
>>>>> >>> devel mailing list
>>>>> >>> devel at rtems.org
>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Rohini Kulkarni
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > devel mailing list
>>>>> > devel at rtems.org
>>>>> > http://lists.rtems.org/mailman/listinfo/devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel at rtems.org
>>>> http://lists.rtems.org/mailman/listinfo/devel
>>>>
>>>
>>>
>>
>>
>> --
>> Rohini Kulkarni
>>
>


-- 
Rohini Kulkarni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20150622/fd2b763f/attachment-0002.html>


More information about the devel mailing list