Memory Barrier

Thomas Doerfler Thomas.Doerfler at
Tue Sep 11 08:11:57 UTC 2007


Feng, Kate schrieb:
> How about _CPU_MSR_SET() used in various PPC/irq.c ? 
> It's inline, but not protected by memory
> barrier, either.
>       _CPU_MSR_GET(msr);
>       new_msr = msr | MSR_EE;
>       _CPU_MSR_SET(new_msr);
>        ...........
>       ............
>       _CPU_MSR_SET(msr);

Hm, I think this is also not a problem.

The compiler is not allowed to move the assignments to "msr" and
"new_msr", because they are used in the next statements.

The PowerPC core might move e.g. the write access to "new_msr" behind
the modification of the Machine State Register. But even an interrupt
could not delay the posted write operation, and even the interrupt
function could not read an "old" value of "new_msr" when a write access
to that variable is already pending in the Load/Store Unit.

I appreciate this discussion, because it will avoid strange errors in
some RTEMS systems, but I think it would help that these topics where
discussed including a defined scenario (like "if the compiler/core uses
this legal optimization and an access to this variable is delayed, and
an interrupt occurs doing that, we get an inconsistency).


> Regards,
> Kate
> -----Original Message-----
> From: at on behalf of Joel Sherrill
> Sent: Mon 9/10/2007 3:34 PM
> To: Till Straumann
> Cc: rtems-users at
> Subject: Re: Memory Barrier
> Till Straumann wrote:
>> The bad news are that a number of CPUs ( i386, m68k, sparc, sh, ... )
>> define
>> #define _CPU_ISR_XXable(k)  <cpu>_XXable_interrupt(K)
>> and the latter is also used in many places (in- and outside of cpukit).
> I am nearly complete on the test build for the search and replace
> on _CPU_ISRXXable under c/.   There is no reason the BSPs and
> other code shouldn't have used the public API.  So this is an excuse
> at that cleanup.
> I will hunt for the others since unless there is some obvious reason
> to use them, they are just barriers to driver portability.
>> I start to wonder if it wouldn't be easier to introduce the barrier
>> where the relevant lowest-level macros for each CPU are defined.
> It might be prudent but it is also effective to just forbid anyone
> from using the _CPU_ISR form.
>> BTW: IMO, barriers are also needed around _CPU_ISR_Set_level()
>> which could have the effect of enabling or disabling interrupts.
> This is probably prudent although I don't think any use of it
> inside RTEMS is particularly critical.  As far as I recall, it is
> only used during task initialization and mode changes.
> --joel
>> -- T.
>> Joel Sherrill wrote:
>>> Till Straumann wrote:
>>>> I filed PR1257 discouraging the use of
>>>> _CPU_ISR_Disable()/_CPU_ISR_Enable()
>>>> These macros should never be used directly
>>>> (they are defined by every CPU architecture
>>>> to be expanded by the generic macros
>>>> _ISR_Disable()/_ISR_Enable() which add a compiler
>>>> memory barrier).
>>> And are made publicly available
>>> as rtems_interrupt_disable and
>>> rtems_interrupt_enable.  That is the
>>> name that every BSP and device driver
>>> should use.  There should be virtually
>>> no code under rtems-XXX/c which references
>>> _CPU_ISR or _ISR_.
>>> I have made a big edit sweep and will
>>> start to commit once I do a full build and
>>> review the diff.
>>> --joel
>>>> -T
>>>> Pavel Pisa wrote:
>>>>> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>>>>>> Joel Sherrill wrote :
>>>>>>> The memory barrier patch was PR 866 and was merged in March 
>>>>>>> 2006.  It is
>>>>>>> in all 4.7 versions.  It first appeared in 4.6.6.
>>>>>> It looks like it, but RTEMS4.7.x still needs patches.
>>>>>> This is not even fixed in
>>>>>> The memory barrier definitely should be fixed in RTEMS4.7.x
>>>>>> before jumping to RTEMS4.8.
>>>>>> Suggestions follow, except I hope I do not miss anything
>>>>>> since I came up with this a while ago.
>>>>>> 1) In cpukit/score/include/rtems/system.h:
>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>>>>> seems to be wrong and misplaced.
>>>>>> The memory barrier is processor dependent.
>>>>>> For example, the memory barrier for PowerPC is "sync".
>>>>>> Thus, for PPC,  it would seem more functional to place
>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: 
>>>>>> "memory")
>>>>>> in cpukit/score/cpu/powerpc/system.h
>>>>>> or somewhere in the processor branch.
>>>>> Hello Kate and others,
>>>>> I would like to react there, because I think, that proposed
>>>>> addition of "sync" is move into really bad direction.
>>>>> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>>>>> is, I believe. It is barrier against compiler optimizer
>>>>> caused reordering of instruction over the barrier.
>>>>> This does not try to declare/cause any globally visible
>>>>> ordering guarantee, by name and anything else.
>>>>> Each architecture 'X' conforming CPU has to guarantee,
>>>>> that even after complex CPU level instruction reordering,
>>>>> register renaming and transfers delaying an sequence
>>>>> of instruction would result in same state (all viewed
>>>>> from CPU POV) as if instructions has been processed
>>>>> in sequential order one by one.
>>>>> This does not mean anything about external memory transfers
>>>>> order at all (at least for PPC, there are some special rules
>>>>> for x86 caches for compatibility with old programs).
>>>>> The macro ensures only ordering of memory transfers
>>>>> from actual CPU POV/perspective. But this is enough
>>>>> even for POV of normal mode and consecutively invoked
>>>>> exception handler working with same data.
>>>>> Even if exception handler starts and CPU does not finish
>>>>> transfers caused by previously initiated operations, reads
>>>>> from exception on same!!! CPU would read back data from
>>>>> write buffer if the address corresponds to previously written
>>>>> data. So preemption or CPU IRQ flags manipulation in scope
>>>>> of the actual CPU does not need enforcing ordering of real
>>>>> memory by very expensive "sync" instruction. It only needs to
>>>>> be sure, that CPU accounts/is aware of the value write transfer
>>>>> before at correct point in the instruction sequence.
>>>>> On the other hand, there could be other reasons and situations
>>>>> requesting correct ordering of externally visible transfers.
>>>>> For example, if IRQ controller is mapped as peripheral into
>>>>> external memory/IO space and CPU IRQ is disabled, than some
>>>>> mask is changed in the controller to disable one of external
>>>>> sources and it is expected, that after IRQ enabling on CPU
>>>>> level there cannot arrive event from that source, ordering
>>>>> of reads and writes to the controller has to be synchronized
>>>>> with CPU ("eieio" has to be used in the PPC case). But it
>>>>> is not task for CPU level IRQ state manipulation. The ordering
>>>>> should and in RTEMS case is ensured by IO access routines
>>>>> which include "eieio" instruction. On the other hand, if
>>>>> some external device is accessed through overlay structures
>>>>> (even volatile), then ordering could be broken without
>>>>> explicitly inserted "eieio".
>>>>> Other legitimate requirement for strict ordering/barrier for
>>>>> external accesses are the cases, where external device/DMA/coprocessor
>>>>> accesses/shares data in system/main memory with CPU.
>>>>>  The "sync"/cache range invalidation/flushing is required before
>>>>> and after external memory accesses (the exact details depend on
>>>>> transfers directions and other parameters).
>>>>>> 3) Among PPC shared/irq/irq.c and other PPCs,
>>>>>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>>>>>> should be used instead of _CPU_ISR_Disable(level) and
>>>>>> _CPU_ISR_Enable( _level )
>>>>> But I fully agree with you, that sequences like following
>>>>> one are fatally broken
>>>>>      _CPU_ISR_Disable(level);
>>>>>      *irq = rtems_hdl_tbl[irq->name];
>>>>>      _CPU_ISR_Enable(level);
>>>>> There is no guarantee, that operation which has been expected
>>>>> to be protected would not be moved outside of protection sequence.
>>>>> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>>>>>> Actually, I think a better one should be 
>>>>>> rtems_interrupt_disable(level)
>>>>>> and rtems_interrupt_enable(level).
>>>>> The code should be changed according to one of your proposals.
>>>>>> 2) In order for the inline to work, the
>>>>>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>>>>> Thus,
>>>>>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>>>> As for the non-inlined version of _Thread_Enable_dispatch,
>>>>> there should not be problem. Calling function without static
>>>>> or inline attributes is considered as full compiler memory ordering
>>>>> barrier point. So no explicit compiler barrier should be needed
>>>>> there.
>>>>> All this is based upon my understanding of code and computer
>>>>> systems principles. There is no doubt, that there could be
>>>>> many other problems and errors. But if there are problems
>>>>> with IRQs behavior on PPC, then the checking, that sequences
>>>>> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>>>>> or higher level rtems_ variants should be used in noticed source
>>>>> file. Else bad things could happen.
>>>>> Excuse me for long answer, but I wanted to clarify things
>>>>> as good way as I could.
>>>>> Best wishes
>>>>>             Pavel
>>>>> _______________________________________________
>>>>> rtems-users mailing list
>>>>> rtems-users at
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at
> _______________________________________________
> rtems-users mailing list
> rtems-users at
> _______________________________________________
> rtems-users mailing list
> rtems-users at

embedded brains GmbH
Thomas Doerfler           Obere Lagerstr. 30
D-82178 Puchheim          Germany
Tel. : +49-89-18 90 80 79-2
Fax  : +49-89-18 90 80 79-9
email: Thomas.Doerfler at
PGP public key available on request

More information about the users mailing list