Memory Barrier

Joel Sherrill joel.sherrill at oarcorp.com
Mon Sep 10 14:53:06 UTC 2007


Till Straumann wrote:
> I filed PR1257 discouraging the use of
> _CPU_ISR_Disable()/_CPU_ISR_Enable()
>
> These macros should never be used directly
> (they are defined by every CPU architecture
> to be expanded by the generic macros
> _ISR_Disable()/_ISR_Enable() which add a compiler
> memory barrier).
>
>   
And are made publicly available
as rtems_interrupt_disable and
rtems_interrupt_enable.  That is the
name that every BSP and device driver
should use.  There should be virtually
no code under rtems-XXX/c which references
_CPU_ISR or _ISR_.

I have made a big edit sweep and will
start to commit once I do a full build and
review the diff.

--joel
> -T
>
> Pavel Pisa wrote:
>   
>> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>>   
>>     
>>> Joel Sherrill wrote :
>>>     
>>>       
>>>> The memory barrier patch was PR 866 and was merged in March 2006.  It is
>>>> in all 4.7 versions.  It first appeared in 4.6.6.
>>>>       
>>>>         
>>> It looks like it, but RTEMS4.7.x still needs patches.
>>> This is not even fixed in 4.77.99.2.
>>> The memory barrier definitely should be fixed in RTEMS4.7.x
>>> before jumping to RTEMS4.8.
>>>
>>> Suggestions follow, except I hope I do not miss anything
>>> since I came up with this a while ago.
>>>
>>> 1) In cpukit/score/include/rtems/system.h:
>>>
>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>>
>>> seems to be wrong and misplaced.
>>>
>>> The memory barrier is processor dependent.
>>> For example, the memory barrier for PowerPC is "sync".
>>>
>>> Thus, for PPC,  it would seem more functional to place
>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: "memory")
>>>
>>> in cpukit/score/cpu/powerpc/system.h
>>> or somewhere in the processor branch.
>>>     
>>>       
>> Hello Kate and others,
>>
>> I would like to react there, because I think, that proposed
>> addition of "sync" is move into really bad direction.
>>
>> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>> is, I believe. It is barrier against compiler optimizer
>> caused reordering of instruction over the barrier.
>> This does not try to declare/cause any globally visible
>> ordering guarantee, by name and anything else.
>>
>> Each architecture 'X' conforming CPU has to guarantee,
>> that even after complex CPU level instruction reordering,
>> register renaming and transfers delaying an sequence
>> of instruction would result in same state (all viewed
>> from CPU POV) as if instructions has been processed
>> in sequential order one by one.
>> This does not mean anything about external memory transfers
>> order at all (at least for PPC, there are some special rules
>> for x86 caches for compatibility with old programs).
>>
>> The macro ensures only ordering of memory transfers
>> from actual CPU POV/perspective. But this is enough
>> even for POV of normal mode and consecutively invoked
>> exception handler working with same data.
>> Even if exception handler starts and CPU does not finish
>> transfers caused by previously initiated operations, reads
>> from exception on same!!! CPU would read back data from
>> write buffer if the address corresponds to previously written
>> data. So preemption or CPU IRQ flags manipulation in scope
>> of the actual CPU does not need enforcing ordering of real
>> memory by very expensive "sync" instruction. It only needs to
>> be sure, that CPU accounts/is aware of the value write transfer
>> before at correct point in the instruction sequence.
>>
>> On the other hand, there could be other reasons and situations
>> requesting correct ordering of externally visible transfers.
>> For example, if IRQ controller is mapped as peripheral into
>> external memory/IO space and CPU IRQ is disabled, than some
>> mask is changed in the controller to disable one of external
>> sources and it is expected, that after IRQ enabling on CPU
>> level there cannot arrive event from that source, ordering
>> of reads and writes to the controller has to be synchronized
>> with CPU ("eieio" has to be used in the PPC case). But it
>> is not task for CPU level IRQ state manipulation. The ordering
>> should and in RTEMS case is ensured by IO access routines
>> which include "eieio" instruction. On the other hand, if
>> some external device is accessed through overlay structures
>> (even volatile), then ordering could be broken without
>> explicitly inserted "eieio".
>> Other legitimate requirement for strict ordering/barrier for
>> external accesses are the cases, where external device/DMA/coprocessor
>> accesses/shares data in system/main memory with CPU.
>>  The "sync"/cache range invalidation/flushing is required before
>> and after external memory accesses (the exact details depend on
>> transfers directions and other parameters).
>>
>>   
>>     
>>> 3) Among PPC shared/irq/irq.c and other PPCs,
>>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>>> should be used instead of _CPU_ISR_Disable(level) and
>>> _CPU_ISR_Enable( _level )
>>>     
>>>       
>> But I fully agree with you, that sequences like following
>> one are fatally broken
>>
>>      _CPU_ISR_Disable(level);
>>      *irq = rtems_hdl_tbl[irq->name];
>>      _CPU_ISR_Enable(level);
>>
>> There is no guarantee, that operation which has been expected
>> to be protected would not be moved outside of protection sequence.
>> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>>
>>   
>>     
>>> Actually, I think a better one should be rtems_interrupt_disable(level)
>>> and rtems_interrupt_enable(level).
>>>     
>>>       
>> The code should be changed according to one of your proposals.
>>
>>   
>>     
>>> 2) In order for the inline to work, the
>>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>>
>>> Thus,
>>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>>
>>> -#define CPU_INLINE_ENABLE_DISPATCH       FALSE
>>> +#define CPU_INLINE_ENABLE_DISPATCH       TRUE
>>>     
>>>       
>> As for the non-inlined version of _Thread_Enable_dispatch,
>> there should not be problem. Calling function without static
>> or inline attributes is considered as full compiler memory ordering
>> barrier point. So no explicit compiler barrier should be needed
>> there.
>>
>> All this is based upon my understanding of code and computer
>> systems principles. There is no doubt, that there could be
>> many other problems and errors. But if there are problems
>> with IRQs behavior on PPC, then the checking, that sequences
>> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>> or higher level rtems_ variants should be used in noticed source
>> file. Else bad things could happen.
>>
>> Excuse me for long answer, but I wanted to clarify things
>> as good way as I could.
>>
>> Best wishes
>>
>>             Pavel
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>   
>>     
>
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>   




More information about the users mailing list