Memory Barrier
Till Straumann
strauman at slac.stanford.edu
Mon Sep 10 19:24:31 UTC 2007
The bad news are that a number of CPUs ( i386, m68k, sparc, sh, ... )
define
#define _CPU_ISR_XXable(k) <cpu>_XXable_interrupt(K)
and the latter is also used in many places (in- and outside of cpukit).
I start to wonder if it wouldn't be easier to introduce the barrier
where the relevant lowest-level macros for each CPU are defined.
BTW: IMO, barriers are also needed around _CPU_ISR_Set_level()
which could have the effect of enabling or disabling interrupts.
-- T.
Joel Sherrill wrote:
> Till Straumann wrote:
>> I filed PR1257 discouraging the use of
>> _CPU_ISR_Disable()/_CPU_ISR_Enable()
>>
>> These macros should never be used directly
>> (they are defined by every CPU architecture
>> to be expanded by the generic macros
>> _ISR_Disable()/_ISR_Enable() which add a compiler
>> memory barrier).
>>
>>
> And are made publicly available
> as rtems_interrupt_disable and
> rtems_interrupt_enable. That is the
> name that every BSP and device driver
> should use. There should be virtually
> no code under rtems-XXX/c which references
> _CPU_ISR or _ISR_.
>
> I have made a big edit sweep and will
> start to commit once I do a full build and
> review the diff.
>
> --joel
>> -T
>>
>> Pavel Pisa wrote:
>>
>>> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>>>
>>>> Joel Sherrill wrote :
>>>>
>>>>> The memory barrier patch was PR 866 and was merged in March 2006.
>>>>> It is
>>>>> in all 4.7 versions. It first appeared in 4.6.6.
>>>>>
>>>> It looks like it, but RTEMS4.7.x still needs patches.
>>>> This is not even fixed in 4.77.99.2.
>>>> The memory barrier definitely should be fixed in RTEMS4.7.x
>>>> before jumping to RTEMS4.8.
>>>>
>>>> Suggestions follow, except I hope I do not miss anything
>>>> since I came up with this a while ago.
>>>>
>>>> 1) In cpukit/score/include/rtems/system.h:
>>>>
>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>>>
>>>> seems to be wrong and misplaced.
>>>>
>>>> The memory barrier is processor dependent.
>>>> For example, the memory barrier for PowerPC is "sync".
>>>>
>>>> Thus, for PPC, it would seem more functional to place
>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync":::
>>>> "memory")
>>>>
>>>> in cpukit/score/cpu/powerpc/system.h
>>>> or somewhere in the processor branch.
>>>>
>>> Hello Kate and others,
>>>
>>> I would like to react there, because I think, that proposed
>>> addition of "sync" is move into really bad direction.
>>>
>>> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>>> is, I believe. It is barrier against compiler optimizer
>>> caused reordering of instruction over the barrier.
>>> This does not try to declare/cause any globally visible
>>> ordering guarantee, by name and anything else.
>>>
>>> Each architecture 'X' conforming CPU has to guarantee,
>>> that even after complex CPU level instruction reordering,
>>> register renaming and transfers delaying an sequence
>>> of instruction would result in same state (all viewed
>>> from CPU POV) as if instructions has been processed
>>> in sequential order one by one.
>>> This does not mean anything about external memory transfers
>>> order at all (at least for PPC, there are some special rules
>>> for x86 caches for compatibility with old programs).
>>>
>>> The macro ensures only ordering of memory transfers
>>> from actual CPU POV/perspective. But this is enough
>>> even for POV of normal mode and consecutively invoked
>>> exception handler working with same data.
>>> Even if exception handler starts and CPU does not finish
>>> transfers caused by previously initiated operations, reads
>>> from exception on same!!! CPU would read back data from
>>> write buffer if the address corresponds to previously written
>>> data. So preemption or CPU IRQ flags manipulation in scope
>>> of the actual CPU does not need enforcing ordering of real
>>> memory by very expensive "sync" instruction. It only needs to
>>> be sure, that CPU accounts/is aware of the value write transfer
>>> before at correct point in the instruction sequence.
>>>
>>> On the other hand, there could be other reasons and situations
>>> requesting correct ordering of externally visible transfers.
>>> For example, if IRQ controller is mapped as peripheral into
>>> external memory/IO space and CPU IRQ is disabled, than some
>>> mask is changed in the controller to disable one of external
>>> sources and it is expected, that after IRQ enabling on CPU
>>> level there cannot arrive event from that source, ordering
>>> of reads and writes to the controller has to be synchronized
>>> with CPU ("eieio" has to be used in the PPC case). But it
>>> is not task for CPU level IRQ state manipulation. The ordering
>>> should and in RTEMS case is ensured by IO access routines
>>> which include "eieio" instruction. On the other hand, if
>>> some external device is accessed through overlay structures
>>> (even volatile), then ordering could be broken without
>>> explicitly inserted "eieio".
>>> Other legitimate requirement for strict ordering/barrier for
>>> external accesses are the cases, where external device/DMA/coprocessor
>>> accesses/shares data in system/main memory with CPU.
>>> The "sync"/cache range invalidation/flushing is required before
>>> and after external memory accesses (the exact details depend on
>>> transfers directions and other parameters).
>>>
>>>
>>>> 3) Among PPC shared/irq/irq.c and other PPCs,
>>>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>>>> should be used instead of _CPU_ISR_Disable(level) and
>>>> _CPU_ISR_Enable( _level )
>>>>
>>> But I fully agree with you, that sequences like following
>>> one are fatally broken
>>>
>>> _CPU_ISR_Disable(level);
>>> *irq = rtems_hdl_tbl[irq->name];
>>> _CPU_ISR_Enable(level);
>>>
>>> There is no guarantee, that operation which has been expected
>>> to be protected would not be moved outside of protection sequence.
>>> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>>>
>>>
>>>> Actually, I think a better one should be
>>>> rtems_interrupt_disable(level)
>>>> and rtems_interrupt_enable(level).
>>>>
>>> The code should be changed according to one of your proposals.
>>>
>>>
>>>> 2) In order for the inline to work, the
>>>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>>>
>>>> Thus,
>>>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>>>
>>>> -#define CPU_INLINE_ENABLE_DISPATCH FALSE
>>>> +#define CPU_INLINE_ENABLE_DISPATCH TRUE
>>>>
>>> As for the non-inlined version of _Thread_Enable_dispatch,
>>> there should not be problem. Calling function without static
>>> or inline attributes is considered as full compiler memory ordering
>>> barrier point. So no explicit compiler barrier should be needed
>>> there.
>>>
>>> All this is based upon my understanding of code and computer
>>> systems principles. There is no doubt, that there could be
>>> many other problems and errors. But if there are problems
>>> with IRQs behavior on PPC, then the checking, that sequences
>>> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>>> or higher level rtems_ variants should be used in noticed source
>>> file. Else bad things could happen.
>>>
>>> Excuse me for long answer, but I wanted to clarify things
>>> as good way as I could.
>>>
>>> Best wishes
>>>
>>> Pavel
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.com
>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>
>>
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>
>
More information about the users
mailing list