GSoC 2015: Raspberry Pi 2 Support

Rohini Kulkarni krohini1593 at gmail.com
Wed Jun 17 22:07:35 UTC 2015


Hi all,

I have updated my blog to reflect my understanding and attempts for cache
performance issue.

Lately I have been trying around memory attributes for the mm_config_table.
One set of configurations for cacheable memory (inner and outer
levels)ended up reducing performance further ( which I really thought would
improve). So this table set up certainly controls performance.

The results are not improving after turning on cache. So memory sections
are perhaps not even getting cached.
I get a feeling it has got to do with this mm_config_table.

Updates from the github code and blog might help in further discussion.

Link to github code:https://github.com/krohini1593/rtems/tree/rohini

Link to Blog <http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html>

Thanks!

On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudmore at gmail.com>
wrote:

> Hi,
> Some of the code examples may give you some clues. Like this one:
> https://github.com/mrvn/test/blob/master/smp.cc
>
> Or this:
> https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT
>
> If you still can't figure it out, you can always join the raspberrypi.org
> forums and ask on this thread:
> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>
> When it comes to the Pi 2 and SMP, you are our RTEMS expert :)
>
> Thanks,
> Alan
>
>
> On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni <krohini1593 at gmail.com>
> wrote:
>
>> Hi,
>>
>> This is regarding Pi 2 SMP support. After powering on, the secondary
>> mailboxes read one of their four mailbox registers and wait for a non-zero
>> content to be written. This content is to be the physical address of the
>> location from where the cores are expected to start execution.
>>
>> I am stuck at figuring out this address. How should I go about
>> understanding this?
>>
>> Thanks!
>> On 3 Jun 2015 19:44, "Gedare Bloom" <gedare at gwu.edu> wrote:
>>
>>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni <krohini1593 at gmail.com>
>>> wrote:
>>> > But, I can't say cache configurations have a role here.
>>> >
>>> > I'll push my code to my github project soon.
>>> >
>>> > P.S. The Pi2 board I possess seems to have broken down. It just isn't
>>> > turning on. Unable to test further. Will order one immediately.
>>> >
>>> Ouch. Make sure you put it in a safe space for development, clear of
>>> threats like moisture, static shock, and cats.
>>>
>>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1593 at gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Alan, your suggestion has resulted in much improvement
>>> >>
>>> >> arm_control=0x1000
>>> >>
>>> >> This has simply worked! Looks like the other cores were taking up
>>> plenty
>>> >> of time.
>>> >> I was aware from references that the other cores run a WFI, but ya,
>>> did
>>> >> not get its impact.
>>> >> Time for each dhrystone has reduced to 7 from 13 and the no of
>>> dhrystones
>>> >> per second also increased.
>>> >>
>>> >> But this is a change only in the config.txt not actually in the boot
>>> code.
>>> >>
>>> >> Thanks
>>> >>
>>> >> Rohini
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore <alan.cudmore at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> The caches are being enabled on the RPI 1 BSP. The same code is being
>>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for the
>>> cache
>>> >>> setup.
>>> >>> I have been reading through this long thread, and it is very
>>> informative:
>>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>>> >>>
>>> >>> I am starting to understand the setup that is required to enable
>>> caches
>>> >>> on the RPI 2. For example this message near the bottom of page 3
>>> gives a
>>> >>> good indication of the speedup available by configuring the MMU and
>>> caches
>>> >>> correctly:
>>> >>> Quote from above thread
>>> >>> ------------------------------
>>> >>> Enabling I/D caches and branch prediction, just like the julia demo
>>> uses,
>>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a much
>>> smaller
>>> >>> loop than the julia demo has.
>>> >>>
>>> >>> Enabling the MMU and mapping memory inner/outer write-back, write
>>> >>> allocate and the framebuffer inner write-through, no write allocate
>>> + outer
>>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps.
>>> >>>
>>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
>>> cache
>>> >>> effect.
>>> >>> -------------------------
>>> >>> End of quote
>>> >>>
>>> >>> The person who posted the above comment (mrvn) posted the code here:
>>> >>> https://github.com/mrvn/test/blob/master/mmu.cc
>>> >>>
>>> >>>
>>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a
>>> wait
>>> >>> loop always accessing the bus. By putting this option in the
>>> config.txt file
>>> >>> you can put the other cores to sleep, speeding up the code on core 1.
>>> >>>  arm_control=0x1000
>>> >>> It would be worth trying that option to see if the benchmark speeds
>>> up.
>>> >>>
>>> >>>
>>> >>> Alan
>>> >>>
>>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary <
>>> heshamelmatary at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni <
>>> krohini1593 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> From what I saw, they have to be enabled separately. Cache/mmu are
>>> >>> disabled
>>> >>> upon reset.
>>> >>>
>>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
>>> >>> however I don't know about Pi2 and where its code is.
>>> >>>
>>> >>> [1]
>>> >>>
>>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
>>> >>>
>>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmatary at gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> Aren't the MMU/Caches enabled by default for RPi [1]?
>>> >>>
>>> >>> [1]
>>> >>>
>>> >>>
>>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c
>>> >>>
>>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni <
>>> krohini1593 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> Dr. Joel,
>>> >>>
>>> >>> So we can't say something solely on the basis of this result?
>>> >>>
>>> >>>
>>> >>> I don't think so. If Linux performs the same, then what you did is as
>>> >>> good as it gets.
>>> >>>
>>> >>> However, if Linux is faster then some setting still isn't right.
>>> >>>
>>> >>> You need a reference measurement to have any confidence. It is
>>> possible
>>> >>> you did something but didn't actually turn the cache (or all the
>>> cache)
>>> >>> on.
>>> >>>
>>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>> wrote:
>>> >>>
>>> >>> I have not run it under linux on pi2 yet. Will have to run and check
>>> >>> the result.
>>> >>>
>>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherrill at oarcorp.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni <
>>> krohini1593 at gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> HI,
>>> >>>
>>> >>> I tried running the dhrystone benchmark with some changes for
>>> >>>
>>> >>> cache/mmu
>>> >>>
>>> >>> set up.
>>> >>>
>>> >>> However, the output shows a reduction in performance.
>>> >>> The time to run through the dhrystone has increased from 12 to 13 and
>>> >>> dhrystones run per second decreased.
>>> >>>
>>> >>> According to this result, things were better with caches disabled.
>>> >>>
>>> >>>
>>> >>> I have been working on this since two days and could not figure out
>>> an
>>> >>> improvement. Any pointers?
>>> >>>
>>> >>>
>>> >>> How did it do under Linux on the Pi2?
>>> >>>
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
>>> >>> <krohini1593 at gmail.com> wrote:
>>> >>>
>>> >>> Hi All,
>>> >>>
>>> >>> I have to implement the cache coherency support for Cortex A7. But
>>> for
>>> >>> A7 MPCore, unlike for A9, I am not able to find any register
>>> >>> description for the Snoop Control Unit from the TRM.
>>> >>>
>>> >>> I need help here on how to proceed.
>>> >>>
>>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary
>>> >>> Control Register which enables cache broadcast operations. The
>>> >>>
>>> >>> register
>>> >>>
>>> >>> format is different for A7 and again I am unable to find how to
>>> >>>
>>> >>> achieve
>>> >>>
>>> >>> the same for A7.
>>> >>>
>>> >>> Thanks!
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I am working with the code for bsp hooks. I am referring to existing
>>> >>> ARM multicore bsp codes, zync mainly.
>>> >>>
>>> >>> 1. There are existing hooks for the raspberry pi. Where should the
>>> >>>
>>> >>> code
>>> >>>
>>> >>> for the  Pi2 hooks be added?
>>> >>>
>>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed inside
>>> >>> the Pi BSP directory.
>>> >>> There is already a Pi2 variant of that code built. But we know
>>> >>>
>>> >>> specific
>>> >>>
>>> >>> places where there
>>> >>> are variances. Depending on the scope of what is different, it can be
>>> >>> as simple as
>>> >>> a cpp conditional in a .h to select a value or two implementations of
>>> >>>
>>> >>> a
>>> >>>
>>> >>> single method
>>> >>> and the Makefile.am picking the right file to build based on the
>>> board
>>> >>> variant.
>>> >>>
>>> >>> The big question to always ask is: Is this specific to the Pi2 and
>>> >>> incompatible with the Pi?
>>> >>>
>>> >>> Since the Pi BSP is still missing capabilities, it is likely code
>>> >>> common to both will
>>> >>> be added this summer. For example, did the mailbox interface change?
>>> I
>>> >>> don't know
>>> >>> but would guess that it didn't.  Each new capability added needs that
>>> >>> added.
>>> >>>
>>> >>> And any differences need to be analyzed to pick the least intrusive
>>> >>>
>>> >>> way
>>> >>>
>>> >>> to provide
>>> >>> alternate implementations. Or enable special code like the Pi2 SMP
>>> >>> support which
>>> >>> is dependent on --enable-smp and being a Pi2.
>>> >>>
>>> >>> 2. Am I right in understanding that I will have to implement A7
>>> >>> specific functions as have been for A9? I am referring specifically
>>> to
>>> >>> the arm-a9mpcore-start.h
>>> >>>
>>> >>> Yes.
>>> >>>
>>> >>> If the code is very similar between the a7 and a9, then a discussion
>>> >>> on devel@ should occur to decide the best way to minimize
>>> duplication.
>>> >>>
>>> >>> If you end up with a7 specific code, you should follow the location
>>> >>>
>>> >>> and
>>> >>>
>>> >>>
>>> >>> naming patterns already established. That places it in
>>> >>> libbsp/arm/shared/...
>>> >>> so it can be used by any BSP with the right SMP core.
>>> >>>
>>> >>>
>>> >>> I am referring to existing codes to locate and get hold of what needs
>>> >>> to be done in the hooks. However, being new to such implementations,
>>> I
>>> >>> am taking longer to understand the details. Any suggestions that
>>> might
>>> >>> help here are welcome
>>> >>>
>>> >>> The answer will depend on the factors listed above. When code can
>>> >>> be shared, we want to share it across as many BSPs as makes sense.
>>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2),
>>> then
>>> >>> you want to find the way to account for the variation in the least
>>> >>> intrusive code way possible.
>>> >>>
>>> >>> Thanks!
>>> >>>
>>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> Excited to be a part of  this edition of GSoC! Thanks to everybody
>>> for
>>> >>> helping me get here and congratulations to all the participating
>>> >>> students!
>>> >>>
>>> >>> So, now getting to work, firstly I wish to know, specifically from my
>>> >>> mentors, any changes that must be made to my proposed project or
>>> >>> schedule.
>>> >>>
>>> >>> Secondly, are there any specifics for the development blog that we
>>> >>>
>>> >>> need
>>> >>>
>>> >>> to create for the project? Over time what is the blog expected to
>>> >>> convey.
>>> >>>
>>> >>> Also, I have to create a new wiki page for my project as none exists.
>>> >>>
>>> >>> I
>>> >>>
>>> >>> want to know how to add one.
>>> >>>
>>> >>> --
>>> >>>
>>> >>> Rohini Kulkarni
>>> >>>
>>> >>>
>>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development
>>> >>> joel.sherrill at OARcorp.com On-Line Applications Research Ask me about
>>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256)
>>> >>>
>>> >>> 722-9985
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>> Rohini Kulkarni
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>>
>>> >>> Rohini Kulkarni
>>> >>>
>>> >>>
>>> >>> --joel
>>> >>>
>>> >>>
>>> >>> --joel
>>> >>> _______________________________________________
>>> >>> devel mailing list
>>> >>> devel at rtems.org
>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Hesham
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Hesham
>>> >>> _______________________________________________
>>> >>> devel mailing list
>>> >>> devel at rtems.org
>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Rohini Kulkarni
>>> >
>>> >
>>> > _______________________________________________
>>> > devel mailing list
>>> > devel at rtems.org
>>> > http://lists.rtems.org/mailman/listinfo/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel at rtems.org
>> http://lists.rtems.org/mailman/listinfo/devel
>>
>
>


-- 
Rohini Kulkarni
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20150618/69ed1946/attachment-0001.html>


More information about the devel mailing list