Memory Barrier (was RE: rtems_semaphore_obtain problems identified)

Fri Sep 7 23:22:44 UTC 2007

On Thursday 06 September 2007 11:43, Feng, Kate wrote:
> Joel Sherrill wrote :
> > The memory barrier patch was PR 866 and was merged in March 2006.  It is
> > in all 4.7 versions.  It first appeared in 4.6.6.
>
> It looks like it, but RTEMS4.7.x still needs patches.
> This is not even fixed in 4.77.99.2.
> The memory barrier definitely should be fixed in RTEMS4.7.x
> before jumping to RTEMS4.8.
>
> Suggestions follow, except I hope I do not miss anything
> since I came up with this a while ago.
>
> 1) In cpukit/score/include/rtems/system.h:
>
> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>
> seems to be wrong and misplaced.
>
> The memory barrier is processor dependent.
> For example, the memory barrier for PowerPC is "sync".
>
> Thus, for PPC,  it would seem more functional to place
> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: "memory")
>
> in cpukit/score/cpu/powerpc/system.h
> or somewhere in the processor branch.

Hello Kate and others,

I would like to react there, because I think, that proposed
addition of "sync" is move into really bad direction.

RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
is, I believe. It is barrier against compiler optimizer
caused reordering of instruction over the barrier.
This does not try to declare/cause any globally visible
ordering guarantee, by name and anything else.

Each architecture 'X' conforming CPU has to guarantee,
that even after complex CPU level instruction reordering,
register renaming and transfers delaying an sequence
of instruction would result in same state (all viewed
from CPU POV) as if instructions has been processed
in sequential order one by one.
This does not mean anything about external memory transfers
order at all (at least for PPC, there are some special rules
for x86 caches for compatibility with old programs).

The macro ensures only ordering of memory transfers
from actual CPU POV/perspective. But this is enough
even for POV of normal mode and consecutively invoked
exception handler working with same data.
Even if exception handler starts and CPU does not finish
transfers caused by previously initiated operations, reads
from exception on same!!! CPU would read back data from
write buffer if the address corresponds to previously written
data. So preemption or CPU IRQ flags manipulation in scope
of the actual CPU does not need enforcing ordering of real
memory by very expensive "sync" instruction. It only needs to
be sure, that CPU accounts/is aware of the value write transfer
before at correct point in the instruction sequence.

On the other hand, there could be other reasons and situations
requesting correct ordering of externally visible transfers.
For example, if IRQ controller is mapped as peripheral into
external memory/IO space and CPU IRQ is disabled, than some
mask is changed in the controller to disable one of external
sources and it is expected, that after IRQ enabling on CPU
level there cannot arrive event from that source, ordering
of reads and writes to the controller has to be synchronized
with CPU ("eieio" has to be used in the PPC case). But it
is not task for CPU level IRQ state manipulation. The ordering
should and in RTEMS case is ensured by IO access routines
which include "eieio" instruction. On the other hand, if
some external device is accessed through overlay structures
(even volatile), then ordering could be broken without
explicitly inserted "eieio".
Other legitimate requirement for strict ordering/barrier for
external accesses are the cases, where external device/DMA/coprocessor
accesses/shares data in system/main memory with CPU.
 The "sync"/cache range invalidation/flushing is required before
and after external memory accesses (the exact details depend on
transfers directions and other parameters).

> 3) Among PPC shared/irq/irq.c and other PPCs,
> _ISR_Disable( _level ), and _ISR_Disable( _level )
> should be used instead of _CPU_ISR_Disable(level) and
> _CPU_ISR_Enable( _level )

But I fully agree with you, that sequences like following
one are fatally broken

     _CPU_ISR_Disable(level);
     *irq = rtems_hdl_tbl[irq->name];
     _CPU_ISR_Enable(level);

There is no guarantee, that operation which has been expected
to be protected would not be moved outside of protection sequence.
Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.

> Actually, I think a better one should be rtems_interrupt_disable(level)
> and rtems_interrupt_enable(level).

The code should be changed according to one of your proposals.

> 2) In order for the inline to work, the
> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>
> Thus,
> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>
> -#define CPU_INLINE_ENABLE_DISPATCH       FALSE
> +#define CPU_INLINE_ENABLE_DISPATCH       TRUE

As for the non-inlined version of _Thread_Enable_dispatch,
there should not be problem. Calling function without static
or inline attributes is considered as full compiler memory ordering
barrier point. So no explicit compiler barrier should be needed
there.

All this is based upon my understanding of code and computer
systems principles. There is no doubt, that there could be
many other problems and errors. But if there are problems
with IRQs behavior on PPC, then the checking, that sequences
like above one do not exist. The _ISR_Disable()/_ISR_Disable()
or higher level rtems_ variants should be used in noticed source
file. Else bad things could happen.

Excuse me for long answer, but I wanted to clarify things
as good way as I could.

Best wishes

            Pavel