Mailbox RPi patch and rtems_cache_* probably broken on RPi

Fri Jun 24 19:16:55 UTC 2016

Hello Gedare,

On Thursday 23 of June 2016 17:44:13 Gedare Bloom wrote:
> This could explain a number of problems reported by students trying to
> get their RPi peripherals working. The cache manager has never been a
> robust and complete implementation. I think it must be carefully
> looked at across targets (easier when we delete obsolete
> architectures!).
>
> It looks like every arch's cache_.h should be defining
> CPU_DATA_CACHE_ALIGNMENT if it has a data cache. This requirement has
> probably never been documented properly somewhere, and it rightly may
> belong in the score/cpu/*/rtems/score/cpu.h.
>
> I'm not sure what you mean by maximal cache alignment.

I am returning to this to clarify what I mean.

The typical cache line length for classic ARM CPUs is 32 bytes
but for example Cortex-A17 MPCore has length 64 bytes.
Cortex-A7 MPCore L1 instruction 32-bytes. L1 data line length of 64-bytes.
Cortex‑A73 MPCore 64 bytes. ARM1156T2-S 32 byte etc.

There exists instruction

MRC p15, 0, <Rd>, c0, c0, 1 ;returns cache details

which can be used for cache organization details retireval.

But start with simple case where you expect to have fixed
constant for cache line length to cover all variantions
of architecture.

Then if you select minimum cache line length for the flushing
(ARM jargon cache cleaning) and invalidation operations
and hardcode value into range cleaning functions then
it would work correctly with overhead that on architecture
variants or part of cache working with longer lines the
operation is repeated multiple times unnecessarily.
But data are not endangered.

On the other hand, if you allocate buffers or some other way
reserve space for data which should be shared then you
have to use maximum expected/possible cache line size
for alignment and reserved size roundup to ensure that
data under special cache control are not shared with
some data accessed without special access management.
Violation of this rule is usually not problem for direction
from CPU to device. If cache line is flushed and then
during device access to data line is cached again and
modifications appears only in area not covered by device
access (not owned by device) then cache line is kept
or even dirtied and optionally written back by CPU
but data pass to device are not modified. But if CPU
expect modification of memory behind cache then it
is fatal problem. Cache line is flushed or only invalidated
if whole revrite is expected. Invalidate of area belonging
to unrelated content can be problem already. But even if flush
is used then subsequent instructions can cause line fill
because they access transfer area unrelated operation.
If such cache fill precedes completion of data transfer
modification from device side then data stored to main memory
are lost from CPU view because it sees staled cache content.
Invalidation do not help either, you lose changes in part
of line belonging to other content. So there is serious data
lost and no way out. This means that part of memory
which access control delegated to device has to span
over complete cache lines.

The sane values for all ARM variants I have found till now
is to process flush/invalidate with considering 32 bytes lines.
On the other hand, aligned allocations and memory reservations/
variables attributes etc. should use 64 bytes alignment
to be safe on Cortex-A.

So this is my proposal for RTEMS ARM support now with hardcoded
cache parameters supporting (hopefully) all ARM family members.

best wishes,

              Pavel