Memory Barrier (was RE: rtems_semaphore_obtain problemsidentified)

Feng, Kate feng at bnl.gov
Mon Sep 10 17:15:11 UTC 2007


Kate Feng wrote:

> PS: Does anyone know how to find the number of OPcodes for all the
> PPC assembly code ?

The cycle times for PPC processors :
   
               Power4          Power5
sync :         ~140             ~50
lwsync:         ~110            ~25

Regards,
Kate

-----Original Message-----
From: rtems-users-bounces+feng1=bnl.gov at rtems.org on behalf of Kate Feng
Sent: Mon 9/10/2007 12:40 PM
To: Pavel Pisa
Cc: rtems-users at rtems.org
Subject: Re: Memory Barrier (was RE: rtems_semaphore_obtain problemsidentified)
 
Hello Pavel and everyone,

I agree that comipler memory barrier is asm volatile(::: "memory").
However, I was talking about the run-time  memory barrier to
prevent aggressive out-of-order and speculative execution in the
processor.

I understand that "sync" is expensive that it's better to be used at
the application level only when necessary according to the flow
of the applcaition.  However, what is important as well
is  the effective location where 'sync' should be applied.
Actaully POWER4 and up (e.g. POWER5) processors
have 'lwsync', which one can consider to use at the
O.S. level as a memory barrier that provides the same
ordering function as the sync instruction, except that a load
caused by an instruction following the |lwsync| may be performed
before a store caused by an instruction that precedes the |lwsync|,
and the ordering does not apply to accesses to I/O memory (memory-mapped 
I/O).

Thus,  I proposed  lwsync for the above porcessors as
memory barrier at the OS level.  Thus, users can decide
the locaiton of I/O memory barrier (e.g. eieio for PPC) according
to their own applcaition.  For processors which do not support lwsync
, lwsync is treated  as sync.

Back to the principal, where is the effective location for lwsync  or 
sync ?
More below.

Pavel Pisa wrote:

>On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>  
>
>>Joel Sherrill wrote :
>>    
>>
>>>The memory barrier patch was PR 866 and was merged in March 2006.  It is
>>>in all 4.7 versions.  It first appeared in 4.6.6.
>>>      
>>>
>>It looks like it, but RTEMS4.7.x still needs patches.
>>This is not even fixed in 4.77.99.2.
>>The memory barrier definitely should be fixed in RTEMS4.7.x
>>before jumping to RTEMS4.8.
>>
>>Suggestions follow, except I hope I do not miss anything
>>since I came up with this a while ago.
>>
>>1) In cpukit/score/include/rtems/system.h:
>>
>>#define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>
>>seems to be wrong and misplaced.
>>
>>The memory barrier is processor dependent.
>>For example, the memory barrier for PowerPC is "sync".
>>
>>Thus, for PPC,  it would seem more functional to place
>>#define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync"::: "memory")
>>
>>in cpukit/score/cpu/powerpc/system.h
>>or somewhere in the processor branch.
>>    
>>
>
>Hello Kate and others,
>
>I would like to react there, because I think, that proposed
>addition of "sync" is move into really bad direction.
>
>RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>is, I believe. It is barrier against compiler optimizer
>caused reordering of instruction over the barrier.
>This does not try to declare/cause any globally visible
>ordering guarantee, by name and anything else.
>
>Each architecture 'X' conforming CPU has to guarantee,
>that even after complex CPU level instruction reordering,
>register renaming and transfers delaying an sequence
>of instruction would result in same state (all viewed
>from CPU POV) as if instructions has been processed
>in sequential order one by one.
>This does not mean anything about external memory transfers
>order at all (at least for PPC, there are some special rules
>for x86 caches for compatibility with old programs).
>
>The macro ensures only ordering of memory transfers
>from actual CPU POV/perspective. But this is enough
>even for POV of normal mode and consecutively invoked
>exception handler working with same data.
>Even if exception handler starts and CPU does not finish
>transfers caused by previously initiated operations, reads
>from exception on same!!! CPU would read back data from
>write buffer if the address corresponds to previously written
>data. So preemption or CPU IRQ flags manipulation in scope
>of the actual CPU does not need enforcing ordering of real
>memory by very expensive "sync" instruction. It only needs to
>be sure, that CPU accounts/is aware of the value write transfer
>before at correct point in the instruction sequence.
>
>On the other hand, there could be other reasons and situations
>requesting correct ordering of externally visible transfers.
>For example, if IRQ controller is mapped as peripheral into
>external memory/IO space and CPU IRQ is disabled, than some
>mask is changed in the controller to disable one of external
>sources and it is expected, that after IRQ enabling on CPU
>level there cannot arrive event from that source, ordering
>of reads and writes to the controller has to be synchronized
>with CPU ("eieio" has to be used in the PPC case). But it
>is not task for CPU level IRQ state manipulation. The ordering
>should and in RTEMS case is ensured by IO access routines
>which include "eieio" instruction. On the other hand, if
>some external device is accessed through overlay structures
>(even volatile), then ordering could be broken without
>explicitly inserted "eieio".
>Other legitimate requirement for strict ordering/barrier for
>external accesses are the cases, where external device/DMA/coprocessor
>accesses/shares data in system/main memory with CPU.
>
The share data among multi-threads needs memory barrier
as well. Thus, the semaphore used for synchronization between two different
thread needs it as well.  At cpukit/rtems/src/semrelease.c, the 4.7..x 
OS did not wish
the compiler to be out-of-order at that point before 
_Thread_Enable_dispatch().
However, does it make sense to allow the run-time system memory access
out-of-order until the code reach the 'sync' or 'lwsync' at the user 
level ?  Perhaps,
those who understand  all levels of  OS will know better about the 
answer to this.
Logically, I  am a little bit confused.

> The "sync"/cache range invalidation/flushing is required before
>and after external memory accesses (the exact details depend on
>transfers directions and other parameters).
>
>  
>
>>3) Among PPC shared/irq/irq.c and other PPCs,
>>_ISR_Disable( _level ), and _ISR_Disable( _level )
>>should be used instead of _CPU_ISR_Disable(level) and
>>_CPU_ISR_Enable( _level )
>>    
>>
>
>
>But I fully agree with you, that sequences like following
>one are fatally broken
>
>     _CPU_ISR_Disable(level);
>     *irq = rtems_hdl_tbl[irq->name];
>     _CPU_ISR_Enable(level);
>
>There is no guarantee, that operation which has been expected
>to be protected would not be moved outside of protection sequence.
>Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>
>  
>
>>Actually, I think a better one should be rtems_interrupt_disable(level)
>>and rtems_interrupt_enable(level).
>>    
>>
>
>The code should be changed according to one of your proposals.
>
>  
>
>>2) In order for the inline to work, the
>>CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>
>>Thus,
>>in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>
>>-#define CPU_INLINE_ENABLE_DISPATCH       FALSE
>>+#define CPU_INLINE_ENABLE_DISPATCH       TRUE
>>    
>>
>
>As for the non-inlined version of _Thread_Enable_dispatch,
>there should not be problem. Calling function without static
>or inline attributes is considered as full compiler memory ordering
>barrier point. So no explicit compiler barrier should be needed
>there.
>
I agree with Sergei's vote.

PS. Does anyone know how to find the number of OPcodes for all the
PPC assembly code ?

Reagrds,
Kate

>
>All this is based upon my understanding of code and computer
>systems principles. There is no doubt, that there could be
>many other problems and errors. But if there are problems
>with IRQs behavior on PPC, then the checking, that sequences
>like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>or higher level rtems_ variants should be used in noticed source
>file. Else bad things could happen.
>
>Excuse me for long answer, but I wanted to clarify things
>as good way as I could.
>
>Best wishes
>
>            Pavel
>  
>






More information about the users mailing list