Memory Barrier (was RE: rtems_semaphore_obtain problems identified)

Till Straumann strauman at slac.stanford.edu
Sat Sep 8 03:54:08 UTC 2007


I agree with Pavel. RTEMS_COMPILER_MEMORY_BARRIER
only has to enforce ordering by the compiler but doesn't
have to make guarantees about ordering on an external bus.
The I/O macros (which IMO should be made available on all
architectures) are responsible for this.

I also agree with his findings that using
_CPU_ISR_Disable() is bad because it does *not* provide
the necessary barrier (which was added to _ISR_Disable()
which references _CPU_ISR_Disable()).

IMHO, direct use of _CPU_ISR_Disable/Enable should be
eliminated from all code at least in 4.8.

-- Till

Pavel Pisa wrote:
> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>   
>> Joel Sherrill wrote :
>>     
>>> The memory barrier patch was PR 866 and was merged in March 2006.  It is
>>> in all 4.7 versions.  It first appeared in 4.6.6.
>>>       
>> It looks like it, but RTEMS4.7.x still needs patches.
>> This is not even fixed in 4.77.99.2.
>> The memory barrier definitely should be fixed in RTEMS4.7.x
>> before jumping to RTEMS4.8.
>>
>> Suggestions follow, except I hope I do not miss anything
>> since I came up with this a while ago.
>>
>> 1) In cpukit/score/include/rtems/system.h:
>>
>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>
>> seems to be wrong and misplaced.
>>
>> The memory barrier is processor dependent.
>> For example, the memory barrier for PowerPC is "sync".
>>
>> Thus, for PPC,  it would seem more functional to place
>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: "memory")
>>
>> in cpukit/score/cpu/powerpc/system.h
>> or somewhere in the processor branch.
>>     
>
> Hello Kate and others,
>
> I would like to react there, because I think, that proposed
> addition of "sync" is move into really bad direction.
>
> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
> is, I believe. It is barrier against compiler optimizer
> caused reordering of instruction over the barrier.
> This does not try to declare/cause any globally visible
> ordering guarantee, by name and anything else.
>
> Each architecture 'X' conforming CPU has to guarantee,
> that even after complex CPU level instruction reordering,
> register renaming and transfers delaying an sequence
> of instruction would result in same state (all viewed
> from CPU POV) as if instructions has been processed
> in sequential order one by one.
> This does not mean anything about external memory transfers
> order at all (at least for PPC, there are some special rules
> for x86 caches for compatibility with old programs).
>
> The macro ensures only ordering of memory transfers
> from actual CPU POV/perspective. But this is enough
> even for POV of normal mode and consecutively invoked
> exception handler working with same data.
> Even if exception handler starts and CPU does not finish
> transfers caused by previously initiated operations, reads
> from exception on same!!! CPU would read back data from
> write buffer if the address corresponds to previously written
> data. So preemption or CPU IRQ flags manipulation in scope
> of the actual CPU does not need enforcing ordering of real
> memory by very expensive "sync" instruction. It only needs to
> be sure, that CPU accounts/is aware of the value write transfer
> before at correct point in the instruction sequence.
>
> On the other hand, there could be other reasons and situations
> requesting correct ordering of externally visible transfers.
> For example, if IRQ controller is mapped as peripheral into
> external memory/IO space and CPU IRQ is disabled, than some
> mask is changed in the controller to disable one of external
> sources and it is expected, that after IRQ enabling on CPU
> level there cannot arrive event from that source, ordering
> of reads and writes to the controller has to be synchronized
> with CPU ("eieio" has to be used in the PPC case). But it
> is not task for CPU level IRQ state manipulation. The ordering
> should and in RTEMS case is ensured by IO access routines
> which include "eieio" instruction. On the other hand, if
> some external device is accessed through overlay structures
> (even volatile), then ordering could be broken without
> explicitly inserted "eieio".
> Other legitimate requirement for strict ordering/barrier for
> external accesses are the cases, where external device/DMA/coprocessor
> accesses/shares data in system/main memory with CPU.
>  The "sync"/cache range invalidation/flushing is required before
> and after external memory accesses (the exact details depend on
> transfers directions and other parameters).
>
>   
>> 3) Among PPC shared/irq/irq.c and other PPCs,
>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>> should be used instead of _CPU_ISR_Disable(level) and
>> _CPU_ISR_Enable( _level )
>>     
>
>
> But I fully agree with you, that sequences like following
> one are fatally broken
>
>      _CPU_ISR_Disable(level);
>      *irq = rtems_hdl_tbl[irq->name];
>      _CPU_ISR_Enable(level);
>
> There is no guarantee, that operation which has been expected
> to be protected would not be moved outside of protection sequence.
> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>
>   
>> Actually, I think a better one should be rtems_interrupt_disable(level)
>> and rtems_interrupt_enable(level).
>>     
>
> The code should be changed according to one of your proposals.
>
>   
>> 2) In order for the inline to work, the
>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>
>> Thus,
>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>
>> -#define CPU_INLINE_ENABLE_DISPATCH       FALSE
>> +#define CPU_INLINE_ENABLE_DISPATCH       TRUE
>>     
>
> As for the non-inlined version of _Thread_Enable_dispatch,
> there should not be problem. Calling function without static
> or inline attributes is considered as full compiler memory ordering
> barrier point. So no explicit compiler barrier should be needed
> there.
>
> All this is based upon my understanding of code and computer
> systems principles. There is no doubt, that there could be
> many other problems and errors. But if there are problems
> with IRQs behavior on PPC, then the checking, that sequences
> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
> or higher level rtems_ variants should be used in noticed source
> file. Else bad things could happen.
>
> Excuse me for long answer, but I wanted to clarify things
> as good way as I could.
>
> Best wishes
>
>             Pavel
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>   





More information about the users mailing list