ARM cache usage
Chris Johns
chrisj at rtems.org
Sat Apr 28 00:19:30 UTC 2018
On 28/4/18 1:11 am, William Busacker wrote:
>
> Can someone point me in the direction of material that can explain how
> RTEMS uses the MMU on an ARM processor (specifically the ARM11 that the
> Raspberry Pi uses)? I want to see if there are any optimizations I make
> in code to take better advantage of how the memory access system works.
I am not across all of the RPi config so what I provide is what I know happens
on other ARM devices.
The MMU set up is here:
https://git.rtems.org/rtems/tree/bsps/arm/raspberrypi/start/mm_config_table.c
The values in the table are configured into the MMU using the BSP start hooks
and there are two of these. They are here for the RPi:
https://git.rtems.org/rtems/tree/bsps/arm/raspberrypi/start/bspstarthooks.c
The hooks are called from the generic ARM start up code:
https://git.rtems.org/rtems/tree/bsps/arm/shared/start/start.S
> My reasoning for trying this is I have a bit of software that is taking
> several 12x12 matrices and vectors of similar size and
> multiplying/adding/inverting them all together a few hundred times over.
> These are all float type, and while I know the sheer calculation time is
> quite high the measured execution time is much higher than it should be
> leading to me suspect that there is memory bottle neck. What I would
> like to know is how the caching system works so I can maybe make
> adjustments to take better advantage of the cache and possibly reduce
> execution time.
I hope the links above provide you with enough information to figure this out.
Please report back what you find. I know on a Zynq which is initialised in a
similar way the memory bandwidth is high and you need specialized hand crafted
NEON instructions to get the maximum from a single ARM core.
What compiler flags are you using?
I see in:
https://git.rtems.org/rtems/tree/bsps/arm/raspberrypi/config/raspberrypi2.cfg
the RPi2 has a NEON. Are you using the NEON?
I know Eigen has explicit vectorization for a NEON and that makes a difference:
http://eigen.tuxfamily.org/index.php?title=Main_Page
> This code is also being generated using Matlab's C Autocoder so that
> code itself isn't exactly readable (but being that I'm using a
> University Matlab license for free, I can't complain too much) so I'd
> like to try and keep manual adjustments to a minimum. If anyone knows of
> tricks to get autocoder to play nicer, that would be great too.
Sorry, I do not use it.
I would run Linux on a similar RPi, compile the code and compare the compiler
options and generated code. I would also run the code in some form of a test and
benchmark it.
Chris
More information about the users
mailing list