[PATCH] rtems: ensure that rtems_cache_aligned_malloc do not align less than to CPU_CACHE_LINE_BYTES.

Pavel Pisa pisa at cmp.felk.cvut.cz
Wed Jun 29 11:14:19 UTC 2016


Hello Sebastian,

On Wednesday 29 of June 2016 07:28:46 Sebastian Huber wrote:
> Hallo Pavel,
>
> On 25/06/16 17:06, Pavel Pisa wrote:
> > There are architectures (for example some/many ARM Cortex-A) which have
> > different cache line sizes for data and instruction caches.
> > CPU kit and even BSP can be build for group of CPUs which differs
> > in cache line sizes as well and there are situations when maximum
> > alignment is not reported by rtems_cache_get_data_line_size.
> >
> > Ensure, that allocated memory is aligned at least to CPU_CACHE_LINE_BYTES
> > which is in pair with compile time data structures allocation.
> >
> > Signed-off-by: Pavel Pisa <pisa at cmp.felk.cvut.cz>
> > ---
> >   cpukit/libcsupport/src/cachealignedalloc.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/cpukit/libcsupport/src/cachealignedalloc.c
> > b/cpukit/libcsupport/src/cachealignedalloc.c index a704859..bb93937
> > 100644
> > --- a/cpukit/libcsupport/src/cachealignedalloc.c
> > +++ b/cpukit/libcsupport/src/cachealignedalloc.c
> > @@ -18,6 +18,9 @@ void *rtems_cache_aligned_malloc( size_t nbytes )
> >     size_t line_size = rtems_cache_get_data_line_size();
> >
> >     if ( line_size > 0 ) {
> > +    if ( line_size < CPU_CACHE_LINE_BYTES )
> > +      line_size = CPU_CACHE_LINE_BYTES;
> > +
> >       /* Assume that the cache line size is a power of two */
> >       size_t m = line_size - 1;
>
> so, in case rtems_cache_get_data_line_size() returns a non-zero value,
> then you assume that it lies to the user and fix it here in this place
> only?
>
> The purpose of CPU_CACHE_LINE_BYTES is to provide a worst-case cache
> line size at cpukit compile-time.

I am sure that logic in the ARM case is fundamentally broken/missing
for the most BSPs/architecture variants now and it needs to be corrected.
The question is where/which way.

The common ARM
  rtems/c/src/lib/libcpu/arm/shared/include/cache_.h
can be extended as shown in my patch.

Other option is (may it be better) to define ARM common cache_.h as empty
with some level of warnings for use on Cortex-A leaving ARM920T
as prone to use by mistake.
Then
  rtems/c/src/lib/libbsp/arm/shared/arm-l2c-310/cache_.h
should be used where appropriate.
For BSPs which do not have L2 or L2 is not enabled,
somethink like
  rtems/c/src/lib/libbsp/arm/shared/arm-cp15-l1-only/cache_.h
could be provided.
This would be more readable and hacky (and internal exposing
to user build) define can be omitted for ARM920T cores
  CPU_CFLAGS += -DCPU_HAS_ARM_CP15_CACHE_OPERATIONS

As for cache line length, it is fixed by define in

rtems/c/src/lib/libcpu/shared/src/cache_manager.c

size_t
rtems_cache_get_data_line_size( void )
{
#if defined(CPU_DATA_CACHE_ALIGNMENT)
  return CPU_DATA_CACHE_ALIGNMENT;
#else
  return 0;
#endif
}

The correct value an be provided by define in
  libbsp/arm/shared/XXX/cache_.h

but there i no alternative provided to define it as a call
to support cache_.h. The right solution for ARMv7+ is to use

CCSIDR, Cache Size ID Registers, VMSA
MRC p15, 1, <Rt>, c0, c0, 0         ; Read current CCSIDR into Rt

LineSize, bits[2:0]
(Log2(Number of words in cache line))–2. For example:
For a line length of 4 words: Log2(4) = 2, LineSize entry = 0.
This is the minimum line length.
For a line length of 8 words: Log2(8) = 3, LineSize entry = 1.

CTR, Cache Type Register, VMSA
MRC p15, 0, <Rt>, c0, c0, 1 ; Read CTR into Rt

DminLine, bits[19:16]
Log2 of the number of words in the smallest cache line of all the data caches and unified caches that are controlled by the processor.

IminLine, bits[3:0] Log2 of the number of words in the smallest cache line of all the instruction caches that are controlled by the processor.

Problem is that format has changed in ARMv7.
But for older (for example ARM920T) it is documented as

Dsize len 13:12 0b10 = 8 words per line (32 bytes)
Isize len 1:0 0b10 = 8 words per line (32 bytes)

Seems shifted by 1.

Anyway mechanism to use algorithmic computation in single multilib variant
is missing and sizes differs even between Cortex-A family.

And it is minimal line length which is required for cache range
clean operations (there cannot be skipped every second line etc.).

Then there is requirement for allocation, which has to be aligned
to maximum if we consider dynamic code loader then even maximum
from all levels of data and instruction caches. Else bad thing happens
when part of line is shared between CPU and area given under
(temporal) control of device.

This can be independent of above values and for Cortex-A can be probably
obtained from

CTR, Cache Type Register, PMSA
MRC p15, 0, <Rt>, c0, c0, 1 ; Read CTR into Rt
CWG, bits[27:24] the Cache Write-back Granule can be determined from maximum cache line size encoded in the Cache Size ID Registers. But can be 
unimplemented zero, then max from other sources has to be computed.

Anyway, actual cache_manager.c implementation and API does not provide
directive to ask for maximal cache alignment.
Even actual Cortex-A L2 cache support is broken because
fixed value of 32 bytes is provided

rtems/c/src/lib/libbsp/arm/shared/include/arm-cache-l1.h
#define ARM_CACHE_L1_CPU_DATA_ALIGNMENT 32

rtems/c/src/lib/libbsp/arm/shared/arm-l2c-310/cache_.h
#define CPU_DATA_CACHE_ALIGNMENT ARM_CACHE_L1_CPU_DATA_ALIGNMENT

And many/most of Cortex-A have 64 bytes L1 date cache line length.
So the RTEMS support is incorrect.

The other thing is CPU kit alignment for static/global
data which has to be defined without use of BSP specific
part. But I hope that this part of the patch is not controversial.
So if you agree, I push following change and then we try
to invent best way how to move forward the cache_manager
and alloc.

diff --git a/cpukit/score/cpu/arm/rtems/score/arm.h b/cpukit/score/cpu/arm/rtems/score/arm.h
index 334e73a..9ae7830 100644
--- a/cpukit/score/cpu/arm/rtems/score/arm.h
+++ b/cpukit/score/cpu/arm/rtems/score/arm.h
@@ -54,6 +54,10 @@ extern "C" {
   #define ARM_MULTILIB_HAS_THREAD_ID_REGISTER
 #endif

+#if defined(__ARM_ARCH_7A__)
+  #define ARM_MULTILIB_CACHE_LINE_MAX_64B
+#endif
+
 #if !defined(__SOFTFP__)
   #if defined(__ARM_NEON__)
     #define ARM_MULTILIB_VFP_D32
diff --git a/cpukit/score/cpu/arm/rtems/score/cpu.h b/cpukit/score/cpu/arm/rtems/score/cpu.h
index 815cd95..c0c782d 100644
--- a/cpukit/score/cpu/arm/rtems/score/cpu.h
+++ b/cpukit/score/cpu/arm/rtems/score/cpu.h
@@ -144,8 +144,11 @@

 #define CPU_STACK_GROWS_UP FALSE

-/* FIXME: Is this the right value? */
-#define CPU_CACHE_LINE_BYTES 32
+#if defined(ARM_MULTILIB_CACHE_LINE_MAX_64B)
+  #define CPU_CACHE_LINE_BYTES 32
+#else
+  #define CPU_CACHE_LINE_BYTES 64
+#endif

 #define CPU_STRUCTURE_ALIGNMENT RTEMS_ALIGNED( CPU_CACHE_LINE_BYTES )


Best wishes,

               Pavel



More information about the devel mailing list