Memory Barrier

Till Straumann strauman at slac.stanford.edu
Tue Sep 11 17:30:50 UTC 2007


Thomas Doerfler wrote:
> Kate,
>
> Feng, Kate schrieb:
>   
>> How about _CPU_MSR_SET() used in various PPC/irq.c ? 
>> It's inline, but not protected by memory
>> barrier, either.
>>
>>       _CPU_MSR_GET(msr);
>>       new_msr = msr | MSR_EE;
>>       _CPU_MSR_SET(new_msr);
>>        ...........
>>       ............
>>       _CPU_MSR_SET(msr);
>>     
>
> Hm, I think this is also not a problem.
>
> The compiler is not allowed to move the assignments to "msr" and
> "new_msr", because they are used in the next statements.
>   
The problem is that the compiler may move code from after
_CPU_MSR_SET(msr) upstream to before it is set thus
moving code into a section that is not protected from interrupts.
Or it may ignore the fact that global variables could change
between the first MSR_Set (enabling interrupts) and the
second one (disabling interrupts).

To prove this, we compile the example routine:

#ifdef BARRIER
#define CLOBBER :"memory"
#else
#define CLOBBER
#endif

#define _CPU_MSR_Get(x) asm volatile ("mfmsr %0":"=r"(x))
#define _CPU_MSR_Set(x) asm volatile ("mtmsr %0"::"r"(x) CLOBBER)

extern void blax(int);

int blah(int *p)
{
unsigned msr;
int a,b;
        a = *p; /* Store 'old' value */

        /* Flash interrupts */
        _CPU_MSR_Get(msr);
        _CPU_MSR_Set(msr | (1<<15));
        /* Here, interrupts could happen and the ISR modify *p */
        _CPU_MSR_Set(msr);
        /* Interrupts are disabled again */
        b = *p; /* Read possibly modified value */
        return a-b; /* Return difference */
}

If you compile this with powerpc-rtems-gcc -c -O
then the compiler optimizes a-b => 0, i.e., the routine
always returns zero even if *p was modified by an ISR
that could intervene.

00000000 <blah>:
   0:   7d 20 00 a6     mfmsr   r9
   4:   61 20 80 00     ori     r0,r9,32768
   8:   7c 00 01 24     mtmsr   r0
   c:   7d 20 01 24     mtmsr   r9
  10:   38 60 00 00     li      r3,0
  14:   4e 80 00 20     blr


Only if you add the barrier then the code does
what it is intended to do (powerpc-rtems-gcc -c -O -DBARRIER):


00000000 <blah>:
   0:   81 63 00 00     lwz     r11,0(r3)
   4:   7d 20 00 a6     mfmsr   r9
   8:   61 20 80 00     ori     r0,r9,32768
   c:   7c 00 01 24     mtmsr   r0
  10:   7d 20 01 24     mtmsr   r9
  14:   80 63 00 00     lwz     r3,0(r3)
  18:   7c 63 58 50     subf    r3,r3,r11
  1c:   4e 80 00 20     blr

(I'm aware that this example does not exactly hit the point
but I couldn't come up with a better one in the 10min I have...)

T.
> The PowerPC core might move e.g. the write access to "new_msr" behind
> the modification of the Machine State Register. But even an interrupt
> could not delay the posted write operation, and even the interrupt
> function could not read an "old" value of "new_msr" when a write access
> to that variable is already pending in the Load/Store Unit.
>
> I appreciate this discussion, because it will avoid strange errors in
> some RTEMS systems, but I think it would help that these topics where
> discussed including a defined scenario (like "if the compiler/core uses
> this legal optimization and an access to this variable is delayed, and
> an interrupt occurs doing that, we get an inconsistency).
>
> wkr,
> Thomas.
>
>   
>> Regards,
>> Kate
>>
>> -----Original Message-----
>> From: rtems-users-bounces+feng1=bnl.gov at rtems.org on behalf of Joel Sherrill
>> Sent: Mon 9/10/2007 3:34 PM
>> To: Till Straumann
>> Cc: rtems-users at rtems.org
>> Subject: Re: Memory Barrier
>>  
>> Till Straumann wrote:
>>     
>>> The bad news are that a number of CPUs ( i386, m68k, sparc, sh, ... )
>>> define
>>>
>>> #define _CPU_ISR_XXable(k)  <cpu>_XXable_interrupt(K)
>>>
>>> and the latter is also used in many places (in- and outside of cpukit).
>>>
>>>       
>> I am nearly complete on the test build for the search and replace
>> on _CPU_ISRXXable under c/.   There is no reason the BSPs and
>> other code shouldn't have used the public API.  So this is an excuse
>> at that cleanup.
>>
>> I will hunt for the others since unless there is some obvious reason
>> to use them, they are just barriers to driver portability.
>>     
>>> I start to wonder if it wouldn't be easier to introduce the barrier
>>> where the relevant lowest-level macros for each CPU are defined.
>>>
>>>       
>> It might be prudent but it is also effective to just forbid anyone
>> from using the _CPU_ISR form.
>>     
>>> BTW: IMO, barriers are also needed around _CPU_ISR_Set_level()
>>> which could have the effect of enabling or disabling interrupts.
>>>
>>>       
>> This is probably prudent although I don't think any use of it
>> inside RTEMS is particularly critical.  As far as I recall, it is
>> only used during task initialization and mode changes.
>>
>> --joel
>>     
>>> -- T.
>>>
>>> Joel Sherrill wrote:
>>>       
>>>> Till Straumann wrote:
>>>>         
>>>>> I filed PR1257 discouraging the use of
>>>>> _CPU_ISR_Disable()/_CPU_ISR_Enable()
>>>>>
>>>>> These macros should never be used directly
>>>>> (they are defined by every CPU architecture
>>>>> to be expanded by the generic macros
>>>>> _ISR_Disable()/_ISR_Enable() which add a compiler
>>>>> memory barrier).
>>>>>
>>>>>   
>>>>>           
>>>> And are made publicly available
>>>> as rtems_interrupt_disable and
>>>> rtems_interrupt_enable.  That is the
>>>> name that every BSP and device driver
>>>> should use.  There should be virtually
>>>> no code under rtems-XXX/c which references
>>>> _CPU_ISR or _ISR_.
>>>>
>>>> I have made a big edit sweep and will
>>>> start to commit once I do a full build and
>>>> review the diff.
>>>>
>>>> --joel
>>>>         
>>>>> -T
>>>>>
>>>>> Pavel Pisa wrote:
>>>>>  
>>>>>           
>>>>>> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>>>>>>     
>>>>>>             
>>>>>>> Joel Sherrill wrote :
>>>>>>>         
>>>>>>>               
>>>>>>>> The memory barrier patch was PR 866 and was merged in March 
>>>>>>>> 2006.  It is
>>>>>>>> in all 4.7 versions.  It first appeared in 4.6.6.
>>>>>>>>               
>>>>>>>>                 
>>>>>>> It looks like it, but RTEMS4.7.x still needs patches.
>>>>>>> This is not even fixed in 4.77.99.2.
>>>>>>> The memory barrier definitely should be fixed in RTEMS4.7.x
>>>>>>> before jumping to RTEMS4.8.
>>>>>>>
>>>>>>> Suggestions follow, except I hope I do not miss anything
>>>>>>> since I came up with this a while ago.
>>>>>>>
>>>>>>> 1) In cpukit/score/include/rtems/system.h:
>>>>>>>
>>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>>>>>>
>>>>>>> seems to be wrong and misplaced.
>>>>>>>
>>>>>>> The memory barrier is processor dependent.
>>>>>>> For example, the memory barrier for PowerPC is "sync".
>>>>>>>
>>>>>>> Thus, for PPC,  it would seem more functional to place
>>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: 
>>>>>>> "memory")
>>>>>>>
>>>>>>> in cpukit/score/cpu/powerpc/system.h
>>>>>>> or somewhere in the processor branch.
>>>>>>>           
>>>>>>>               
>>>>>> Hello Kate and others,
>>>>>>
>>>>>> I would like to react there, because I think, that proposed
>>>>>> addition of "sync" is move into really bad direction.
>>>>>>
>>>>>> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>>>>>> is, I believe. It is barrier against compiler optimizer
>>>>>> caused reordering of instruction over the barrier.
>>>>>> This does not try to declare/cause any globally visible
>>>>>> ordering guarantee, by name and anything else.
>>>>>>
>>>>>> Each architecture 'X' conforming CPU has to guarantee,
>>>>>> that even after complex CPU level instruction reordering,
>>>>>> register renaming and transfers delaying an sequence
>>>>>> of instruction would result in same state (all viewed
>>>>>> from CPU POV) as if instructions has been processed
>>>>>> in sequential order one by one.
>>>>>> This does not mean anything about external memory transfers
>>>>>> order at all (at least for PPC, there are some special rules
>>>>>> for x86 caches for compatibility with old programs).
>>>>>>
>>>>>> The macro ensures only ordering of memory transfers
>>>>>> from actual CPU POV/perspective. But this is enough
>>>>>> even for POV of normal mode and consecutively invoked
>>>>>> exception handler working with same data.
>>>>>> Even if exception handler starts and CPU does not finish
>>>>>> transfers caused by previously initiated operations, reads
>>>>>> from exception on same!!! CPU would read back data from
>>>>>> write buffer if the address corresponds to previously written
>>>>>> data. So preemption or CPU IRQ flags manipulation in scope
>>>>>> of the actual CPU does not need enforcing ordering of real
>>>>>> memory by very expensive "sync" instruction. It only needs to
>>>>>> be sure, that CPU accounts/is aware of the value write transfer
>>>>>> before at correct point in the instruction sequence.
>>>>>>
>>>>>> On the other hand, there could be other reasons and situations
>>>>>> requesting correct ordering of externally visible transfers.
>>>>>> For example, if IRQ controller is mapped as peripheral into
>>>>>> external memory/IO space and CPU IRQ is disabled, than some
>>>>>> mask is changed in the controller to disable one of external
>>>>>> sources and it is expected, that after IRQ enabling on CPU
>>>>>> level there cannot arrive event from that source, ordering
>>>>>> of reads and writes to the controller has to be synchronized
>>>>>> with CPU ("eieio" has to be used in the PPC case). But it
>>>>>> is not task for CPU level IRQ state manipulation. The ordering
>>>>>> should and in RTEMS case is ensured by IO access routines
>>>>>> which include "eieio" instruction. On the other hand, if
>>>>>> some external device is accessed through overlay structures
>>>>>> (even volatile), then ordering could be broken without
>>>>>> explicitly inserted "eieio".
>>>>>> Other legitimate requirement for strict ordering/barrier for
>>>>>> external accesses are the cases, where external device/DMA/coprocessor
>>>>>> accesses/shares data in system/main memory with CPU.
>>>>>>  The "sync"/cache range invalidation/flushing is required before
>>>>>> and after external memory accesses (the exact details depend on
>>>>>> transfers directions and other parameters).
>>>>>>
>>>>>>     
>>>>>>             
>>>>>>> 3) Among PPC shared/irq/irq.c and other PPCs,
>>>>>>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>>>>>>> should be used instead of _CPU_ISR_Disable(level) and
>>>>>>> _CPU_ISR_Enable( _level )
>>>>>>>           
>>>>>>>               
>>>>>> But I fully agree with you, that sequences like following
>>>>>> one are fatally broken
>>>>>>
>>>>>>      _CPU_ISR_Disable(level);
>>>>>>      *irq = rtems_hdl_tbl[irq->name];
>>>>>>      _CPU_ISR_Enable(level);
>>>>>>
>>>>>> There is no guarantee, that operation which has been expected
>>>>>> to be protected would not be moved outside of protection sequence.
>>>>>> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>>>>>>
>>>>>>     
>>>>>>             
>>>>>>> Actually, I think a better one should be 
>>>>>>> rtems_interrupt_disable(level)
>>>>>>> and rtems_interrupt_enable(level).
>>>>>>>           
>>>>>>>               
>>>>>> The code should be changed according to one of your proposals.
>>>>>>
>>>>>>     
>>>>>>             
>>>>>>> 2) In order for the inline to work, the
>>>>>>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>>>>>>
>>>>>>> Thus,
>>>>>>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>>>>>>
>>>>>>> -#define CPU_INLINE_ENABLE_DISPATCH       FALSE
>>>>>>> +#define CPU_INLINE_ENABLE_DISPATCH       TRUE
>>>>>>>           
>>>>>>>               
>>>>>> As for the non-inlined version of _Thread_Enable_dispatch,
>>>>>> there should not be problem. Calling function without static
>>>>>> or inline attributes is considered as full compiler memory ordering
>>>>>> barrier point. So no explicit compiler barrier should be needed
>>>>>> there.
>>>>>>
>>>>>> All this is based upon my understanding of code and computer
>>>>>> systems principles. There is no doubt, that there could be
>>>>>> many other problems and errors. But if there are problems
>>>>>> with IRQs behavior on PPC, then the checking, that sequences
>>>>>> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>>>>>> or higher level rtems_ variants should be used in noticed source
>>>>>> file. Else bad things could happen.
>>>>>>
>>>>>> Excuse me for long answer, but I wanted to clarify things
>>>>>> as good way as I could.
>>>>>>
>>>>>> Best wishes
>>>>>>
>>>>>>             Pavel
>>>>>> _______________________________________________
>>>>>> rtems-users mailing list
>>>>>> rtems-users at rtems.com
>>>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>>>       
>>>>>>             
>>>>> _______________________________________________
>>>>> rtems-users mailing list
>>>>> rtems-users at rtems.com
>>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>>   
>>>>>           
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>     
>
>
>   





More information about the users mailing list