GSoC 2015: Raspberry Pi 2 Support

Gedare Bloom gedare at gwu.edu
Sun Jun 21 19:22:33 UTC 2015


On Sun, Jun 21, 2015 at 3:04 PM, Rohini Kulkarni <krohini1593 at gmail.com> wrote:
> I missed mentioning the number of dhrystones in the previous mail.
>
> Originally it was 1 million.
> The new number of dhrystones I executed is 100 million.
>
The next thing to do is to figure out what changes are contributing to
the performance improvement, and then prepare patches. :) Great work

> On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni <krohini1593 at gmail.com>
> wrote:
>>
>> Hi all,
>>
>> I have managed to get a significant performance improvement with some
>> changes in configurations.
>>
>> The measured time was for dhrystones reduced from 12 to "too small to be
>> measured "
>>
>> For dhrystones the time was 0.4.
>>
>> The number of dhrystones per second increased from approximately 83333 to
>> 2500000 :)
>>
>> Thanks!
>>
>> On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni <krohini1593 at gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I have added an SMP related post to my blog to define where exactly in
>>> the code I need to work. Some feedback to indicate if I am identifying the
>>> work area correctly would be very helpful!
>>>
>>> Thanks!
>>>
>>> On 18 Jun 2015 03:37, "Rohini Kulkarni" <krohini1593 at gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have updated my blog to reflect my understanding and attempts for
>>>> cache performance issue.
>>>>
>>>> Lately I have been trying around memory attributes for the
>>>> mm_config_table. One set of configurations for cacheable memory (inner and
>>>> outer levels)ended up reducing performance further ( which I really thought
>>>> would improve). So this table set up certainly controls performance.
>>>>
>>>> The results are not improving after turning on cache. So memory sections
>>>> are perhaps not even getting cached.
>>>> I get a feeling it has got to do with this mm_config_table.
>>>>
>>>> Updates from the github code and blog might help in further discussion.
>>>>
>>>> Link to github code:https://github.com/krohini1593/rtems/tree/rohini
>>>>
>>>> Link to Blog
>>>>
>>>> Thanks!
>>>>
>>>> On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudmore at gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> Some of the code examples may give you some clues. Like this one:
>>>>> https://github.com/mrvn/test/blob/master/smp.cc
>>>>>
>>>>> Or this:
>>>>> https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT
>>>>>
>>>>> If you still can't figure it out, you can always join the
>>>>> raspberrypi.org forums and ask on this thread:
>>>>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>>>>>
>>>>> When it comes to the Pi 2 and SMP, you are our RTEMS expert :)
>>>>>
>>>>> Thanks,
>>>>> Alan
>>>>>
>>>>>
>>>>> On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni
>>>>> <krohini1593 at gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This is regarding Pi 2 SMP support. After powering on, the secondary
>>>>>> mailboxes read one of their four mailbox registers and wait for a non-zero
>>>>>> content to be written. This content is to be the physical address of the
>>>>>> location from where the cores are expected to start execution.
>>>>>>
>>>>>> I am stuck at figuring out this address. How should I go about
>>>>>> understanding this?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On 3 Jun 2015 19:44, "Gedare Bloom" <gedare at gwu.edu> wrote:
>>>>>>>
>>>>>>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni
>>>>>>> <krohini1593 at gmail.com> wrote:
>>>>>>> > But, I can't say cache configurations have a role here.
>>>>>>> >
>>>>>>> > I'll push my code to my github project soon.
>>>>>>> >
>>>>>>> > P.S. The Pi2 board I possess seems to have broken down. It just
>>>>>>> > isn't
>>>>>>> > turning on. Unable to test further. Will order one immediately.
>>>>>>> >
>>>>>>> Ouch. Make sure you put it in a safe space for development, clear of
>>>>>>> threats like moisture, static shock, and cats.
>>>>>>>
>>>>>>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>>>> > wrote:
>>>>>>> >>
>>>>>>> >> Hi,
>>>>>>> >>
>>>>>>> >> Alan, your suggestion has resulted in much improvement
>>>>>>> >>
>>>>>>> >> arm_control=0x1000
>>>>>>> >>
>>>>>>> >> This has simply worked! Looks like the other cores were taking up
>>>>>>> >> plenty
>>>>>>> >> of time.
>>>>>>> >> I was aware from references that the other cores run a WFI, but
>>>>>>> >> ya, did
>>>>>>> >> not get its impact.
>>>>>>> >> Time for each dhrystone has reduced to 7 from 13 and the no of
>>>>>>> >> dhrystones
>>>>>>> >> per second also increased.
>>>>>>> >>
>>>>>>> >> But this is a change only in the config.txt not actually in the
>>>>>>> >> boot code.
>>>>>>> >>
>>>>>>> >> Thanks
>>>>>>> >>
>>>>>>> >> Rohini
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore
>>>>>>> >> <alan.cudmore at gmail.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> The caches are being enabled on the RPI 1 BSP. The same code is
>>>>>>> >>> being
>>>>>>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for
>>>>>>> >>> the cache
>>>>>>> >>> setup.
>>>>>>> >>> I have been reading through this long thread, and it is very
>>>>>>> >>> informative:
>>>>>>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904
>>>>>>> >>>
>>>>>>> >>> I am starting to understand the setup that is required to enable
>>>>>>> >>> caches
>>>>>>> >>> on the RPI 2. For example this message near the bottom of page 3
>>>>>>> >>> gives a
>>>>>>> >>> good indication of the speedup available by configuring the MMU
>>>>>>> >>> and caches
>>>>>>> >>> correctly:
>>>>>>> >>> Quote from above thread
>>>>>>> >>> ------------------------------
>>>>>>> >>> Enabling I/D caches and branch prediction, just like the julia
>>>>>>> >>> demo uses,
>>>>>>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a
>>>>>>> >>> much smaller
>>>>>>> >>> loop than the julia demo has.
>>>>>>> >>>
>>>>>>> >>> Enabling the MMU and mapping memory inner/outer write-back, write
>>>>>>> >>> allocate and the framebuffer inner write-through, no write
>>>>>>> >>> allocate + outer
>>>>>>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps.
>>>>>>> >>>
>>>>>>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
>>>>>>> >>> cache
>>>>>>> >>> effect.
>>>>>>> >>> -------------------------
>>>>>>> >>> End of quote
>>>>>>> >>>
>>>>>>> >>> The person who posted the above comment (mrvn) posted the code
>>>>>>> >>> here:
>>>>>>> >>> https://github.com/mrvn/test/blob/master/mmu.cc
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in
>>>>>>> >>> a wait
>>>>>>> >>> loop always accessing the bus. By putting this option in the
>>>>>>> >>> config.txt file
>>>>>>> >>> you can put the other cores to sleep, speeding up the code on
>>>>>>> >>> core 1.
>>>>>>> >>>  arm_control=0x1000
>>>>>>> >>> It would be worth trying that option to see if the benchmark
>>>>>>> >>> speeds up.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Alan
>>>>>>> >>>
>>>>>>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary
>>>>>>> >>> <heshamelmatary at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni
>>>>>>> >>> <krohini1593 at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>> From what I saw, they have to be enabled separately. Cache/mmu
>>>>>>> >>> are
>>>>>>> >>> disabled
>>>>>>> >>> upon reset.
>>>>>>> >>>
>>>>>>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache
>>>>>>> >>> init,
>>>>>>> >>> however I don't know about Pi2 and where its code is.
>>>>>>> >>>
>>>>>>> >>> [1]
>>>>>>> >>>
>>>>>>> >>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
>>>>>>> >>>
>>>>>>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmatary at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Hi,
>>>>>>> >>>
>>>>>>> >>> Aren't the MMU/Caches enabled by default for RPi [1]?
>>>>>>> >>>
>>>>>>> >>> [1]
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c
>>>>>>> >>>
>>>>>>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
>>>>>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni
>>>>>>> >>> <krohini1593 at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>> Dr. Joel,
>>>>>>> >>>
>>>>>>> >>> So we can't say something solely on the basis of this result?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> I don't think so. If Linux performs the same, then what you did
>>>>>>> >>> is as
>>>>>>> >>> good as it gets.
>>>>>>> >>>
>>>>>>> >>> However, if Linux is faster then some setting still isn't right.
>>>>>>> >>>
>>>>>>> >>> You need a reference measurement to have any confidence. It is
>>>>>>> >>> possible
>>>>>>> >>> you did something but didn't actually turn the cache (or all the
>>>>>>> >>> cache)
>>>>>>> >>> on.
>>>>>>> >>>
>>>>>>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>> I have not run it under linux on pi2 yet. Will have to run and
>>>>>>> >>> check
>>>>>>> >>> the result.
>>>>>>> >>>
>>>>>>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherrill at oarcorp.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni
>>>>>>> >>> <krohini1593 at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>> HI,
>>>>>>> >>>
>>>>>>> >>> I tried running the dhrystone benchmark with some changes for
>>>>>>> >>>
>>>>>>> >>> cache/mmu
>>>>>>> >>>
>>>>>>> >>> set up.
>>>>>>> >>>
>>>>>>> >>> However, the output shows a reduction in performance.
>>>>>>> >>> The time to run through the dhrystone has increased from 12 to 13
>>>>>>> >>> and
>>>>>>> >>> dhrystones run per second decreased.
>>>>>>> >>>
>>>>>>> >>> According to this result, things were better with caches
>>>>>>> >>> disabled.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> I have been working on this since two days and could not figure
>>>>>>> >>> out an
>>>>>>> >>> improvement. Any pointers?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> How did it do under Linux on the Pi2?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Thanks.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
>>>>>>> >>> <krohini1593 at gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi All,
>>>>>>> >>>
>>>>>>> >>> I have to implement the cache coherency support for Cortex A7.
>>>>>>> >>> But for
>>>>>>> >>> A7 MPCore, unlike for A9, I am not able to find any register
>>>>>>> >>> description for the Snoop Control Unit from the TRM.
>>>>>>> >>>
>>>>>>> >>> I need help here on how to proceed.
>>>>>>> >>>
>>>>>>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary
>>>>>>> >>> Control Register which enables cache broadcast operations. The
>>>>>>> >>>
>>>>>>> >>> register
>>>>>>> >>>
>>>>>>> >>> format is different for A7 and again I am unable to find how to
>>>>>>> >>>
>>>>>>> >>> achieve
>>>>>>> >>>
>>>>>>> >>> the same for A7.
>>>>>>> >>>
>>>>>>> >>> Thanks!
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
>>>>>>> >>> <joel.sherrill at oarcorp.com> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:
>>>>>>> >>>
>>>>>>> >>> Hi,
>>>>>>> >>>
>>>>>>> >>> I am working with the code for bsp hooks. I am referring to
>>>>>>> >>> existing
>>>>>>> >>> ARM multicore bsp codes, zync mainly.
>>>>>>> >>>
>>>>>>> >>> 1. There are existing hooks for the raspberry pi. Where should
>>>>>>> >>> the
>>>>>>> >>>
>>>>>>> >>> code
>>>>>>> >>>
>>>>>>> >>> for the  Pi2 hooks be added?
>>>>>>> >>>
>>>>>>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed
>>>>>>> >>> inside
>>>>>>> >>> the Pi BSP directory.
>>>>>>> >>> There is already a Pi2 variant of that code built. But we know
>>>>>>> >>>
>>>>>>> >>> specific
>>>>>>> >>>
>>>>>>> >>> places where there
>>>>>>> >>> are variances. Depending on the scope of what is different, it
>>>>>>> >>> can be
>>>>>>> >>> as simple as
>>>>>>> >>> a cpp conditional in a .h to select a value or two
>>>>>>> >>> implementations of
>>>>>>> >>>
>>>>>>> >>> a
>>>>>>> >>>
>>>>>>> >>> single method
>>>>>>> >>> and the Makefile.am picking the right file to build based on the
>>>>>>> >>> board
>>>>>>> >>> variant.
>>>>>>> >>>
>>>>>>> >>> The big question to always ask is: Is this specific to the Pi2
>>>>>>> >>> and
>>>>>>> >>> incompatible with the Pi?
>>>>>>> >>>
>>>>>>> >>> Since the Pi BSP is still missing capabilities, it is likely code
>>>>>>> >>> common to both will
>>>>>>> >>> be added this summer. For example, did the mailbox interface
>>>>>>> >>> change? I
>>>>>>> >>> don't know
>>>>>>> >>> but would guess that it didn't.  Each new capability added needs
>>>>>>> >>> that
>>>>>>> >>> added.
>>>>>>> >>>
>>>>>>> >>> And any differences need to be analyzed to pick the least
>>>>>>> >>> intrusive
>>>>>>> >>>
>>>>>>> >>> way
>>>>>>> >>>
>>>>>>> >>> to provide
>>>>>>> >>> alternate implementations. Or enable special code like the Pi2
>>>>>>> >>> SMP
>>>>>>> >>> support which
>>>>>>> >>> is dependent on --enable-smp and being a Pi2.
>>>>>>> >>>
>>>>>>> >>> 2. Am I right in understanding that I will have to implement A7
>>>>>>> >>> specific functions as have been for A9? I am referring
>>>>>>> >>> specifically to
>>>>>>> >>> the arm-a9mpcore-start.h
>>>>>>> >>>
>>>>>>> >>> Yes.
>>>>>>> >>>
>>>>>>> >>> If the code is very similar between the a7 and a9, then a
>>>>>>> >>> discussion
>>>>>>> >>> on devel@ should occur to decide the best way to minimize
>>>>>>> >>> duplication.
>>>>>>> >>>
>>>>>>> >>> If you end up with a7 specific code, you should follow the
>>>>>>> >>> location
>>>>>>> >>>
>>>>>>> >>> and
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> naming patterns already established. That places it in
>>>>>>> >>> libbsp/arm/shared/...
>>>>>>> >>> so it can be used by any BSP with the right SMP core.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> I am referring to existing codes to locate and get hold of what
>>>>>>> >>> needs
>>>>>>> >>> to be done in the hooks. However, being new to such
>>>>>>> >>> implementations, I
>>>>>>> >>> am taking longer to understand the details. Any suggestions that
>>>>>>> >>> might
>>>>>>> >>> help here are welcome
>>>>>>> >>>
>>>>>>> >>> The answer will depend on the factors listed above. When code can
>>>>>>> >>> be shared, we want to share it across as many BSPs as makes
>>>>>>> >>> sense.
>>>>>>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2),
>>>>>>> >>> then
>>>>>>> >>> you want to find the way to account for the variation in the
>>>>>>> >>> least
>>>>>>> >>> intrusive code way possible.
>>>>>>> >>>
>>>>>>> >>> Thanks!
>>>>>>> >>>
>>>>>>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1593 at gmail.com>
>>>>>>> >>> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> Hi,
>>>>>>> >>>
>>>>>>> >>> Excited to be a part of  this edition of GSoC! Thanks to
>>>>>>> >>> everybody for
>>>>>>> >>> helping me get here and congratulations to all the participating
>>>>>>> >>> students!
>>>>>>> >>>
>>>>>>> >>> So, now getting to work, firstly I wish to know, specifically
>>>>>>> >>> from my
>>>>>>> >>> mentors, any changes that must be made to my proposed project or
>>>>>>> >>> schedule.
>>>>>>> >>>
>>>>>>> >>> Secondly, are there any specifics for the development blog that
>>>>>>> >>> we
>>>>>>> >>>
>>>>>>> >>> need
>>>>>>> >>>
>>>>>>> >>> to create for the project? Over time what is the blog expected to
>>>>>>> >>> convey.
>>>>>>> >>>
>>>>>>> >>> Also, I have to create a new wiki page for my project as none
>>>>>>> >>> exists.
>>>>>>> >>>
>>>>>>> >>> I
>>>>>>> >>>
>>>>>>> >>> want to know how to add one.
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>>
>>>>>>> >>> Rohini Kulkarni
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development
>>>>>>> >>> joel.sherrill at OARcorp.com On-Line Applications Research Ask me
>>>>>>> >>> about
>>>>>>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256)
>>>>>>> >>>
>>>>>>> >>> 722-9985
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>>
>>>>>>> >>> Rohini Kulkarni
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>>
>>>>>>> >>> Rohini Kulkarni
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --joel
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --joel
>>>>>>> >>> _______________________________________________
>>>>>>> >>> devel mailing list
>>>>>>> >>> devel at rtems.org
>>>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> Hesham
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> --
>>>>>>> >>> Hesham
>>>>>>> >>> _______________________________________________
>>>>>>> >>> devel mailing list
>>>>>>> >>> devel at rtems.org
>>>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Rohini Kulkarni
>>>>>>> >
>>>>>>> >
>>>>>>> > _______________________________________________
>>>>>>> > devel mailing list
>>>>>>> > devel at rtems.org
>>>>>>> > http://lists.rtems.org/mailman/listinfo/devel
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel at rtems.org
>>>>>> http://lists.rtems.org/mailman/listinfo/devel
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Rohini Kulkarni
>>
>>
>>
>>
>> --
>> Rohini Kulkarni
>
>
>
>
> --
> Rohini Kulkarni
>
> _______________________________________________
> devel mailing list
> devel at rtems.org
> http://lists.rtems.org/mailman/listinfo/devel



More information about the devel mailing list