ARM cache usage

Chris Johns chrisj at
Sat Apr 28 00:19:30 UTC 2018

On 28/4/18 1:11 am, William Busacker wrote:
> Can someone point me in the direction of material that can explain how 
> RTEMS uses the MMU on an ARM processor (specifically the ARM11 that the 
> Raspberry Pi uses)? I want to see if there are any optimizations I make 
> in code to take better advantage of how the memory access system works.

I am not across all of the RPi config so what I provide is what I know happens
on other ARM devices.

The MMU set up is here:

The values in the table are configured into the MMU using the BSP start hooks
and there are two of these. They are here for the RPi:

The hooks are called from the generic ARM start up code:

> My reasoning for trying this is I have a bit of software that is taking 
> several 12x12 matrices and vectors of similar size and 
> multiplying/adding/inverting them all together a few hundred times over. 
> These are all float type, and while I know the sheer calculation time is 
> quite high the measured execution time is much higher than it should be 
> leading to me suspect that there is memory bottle neck. What I would 
> like to know is how the caching system works so I can maybe make 
> adjustments to take better advantage of the cache and possibly reduce 
> execution time.

I hope the links above provide you with enough information to figure this out.
Please report back what you find. I know on a Zynq which is initialised in a
similar way the memory bandwidth is high and you need specialized hand crafted
NEON instructions to get the maximum from a single ARM core.

What compiler flags are you using?

I see in:

the RPi2 has a NEON. Are you using the NEON?

I know Eigen has explicit vectorization for a NEON and that makes a difference:

> This code is also being generated using Matlab's C Autocoder so that 
> code itself isn't exactly readable (but being that I'm using a 
> University Matlab license for free, I can't complain too much) so I'd 
> like to try and keep manual adjustments to a minimum. If anyone knows of 
> tricks to get autocoder to play nicer, that would be great too.

Sorry, I do not use it.

I would run Linux on a similar RPi, compile the code and compare the compiler
options and generated code. I would also run the code in some form of a test and
benchmark it.


More information about the users mailing list