Memory Barrier
Till Straumann
strauman at slac.stanford.edu
Tue Sep 11 05:12:01 UTC 2007
Good catch, Kate.
Yesterday I patched cpukit/score/cpu/powerpc, m68k and i386 (the
architectures we use here) and introduced the "memory" clobber
to the basic ASM macros (including CPU_MSR_SET) but
Joel is trying to eliminate the use of CPU_ISR_xxable() leaving
the low-level macros alone.
Note that _ISR_Enable()/_ISR_Disable() cannot be used in the
code sections Kate refers to because what's needed here
is the opposite:
old_mask = get_mask_and_ENABLE_irq()
...
set_mask( old_mask );
T.
Feng, Kate wrote:
> How about _CPU_MSR_SET() used in various PPC/irq.c ?
> It's inline, but not protected by memory
> barrier, either.
>
> _CPU_MSR_GET(msr);
> new_msr = msr | MSR_EE;
> _CPU_MSR_SET(new_msr);
> ...........
> ............
> _CPU_MSR_SET(msr);
>
> Regards,
> Kate
>
> -----Original Message-----
> From: rtems-users-bounces+feng1=bnl.gov at rtems.org on behalf of Joel Sherrill
> Sent: Mon 9/10/2007 3:34 PM
> To: Till Straumann
> Cc: rtems-users at rtems.org
> Subject: Re: Memory Barrier
>
> Till Straumann wrote:
>
>> The bad news are that a number of CPUs ( i386, m68k, sparc, sh, ... )
>> define
>>
>> #define _CPU_ISR_XXable(k) <cpu>_XXable_interrupt(K)
>>
>> and the latter is also used in many places (in- and outside of cpukit).
>>
>>
> I am nearly complete on the test build for the search and replace
> on _CPU_ISRXXable under c/. There is no reason the BSPs and
> other code shouldn't have used the public API. So this is an excuse
> at that cleanup.
>
> I will hunt for the others since unless there is some obvious reason
> to use them, they are just barriers to driver portability.
>
>> I start to wonder if it wouldn't be easier to introduce the barrier
>> where the relevant lowest-level macros for each CPU are defined.
>>
>>
> It might be prudent but it is also effective to just forbid anyone
> from using the _CPU_ISR form.
>
>> BTW: IMO, barriers are also needed around _CPU_ISR_Set_level()
>> which could have the effect of enabling or disabling interrupts.
>>
>>
> This is probably prudent although I don't think any use of it
> inside RTEMS is particularly critical. As far as I recall, it is
> only used during task initialization and mode changes.
>
> --joel
>
>> -- T.
>>
>> Joel Sherrill wrote:
>>
>>> Till Straumann wrote:
>>>
>>>> I filed PR1257 discouraging the use of
>>>> _CPU_ISR_Disable()/_CPU_ISR_Enable()
>>>>
>>>> These macros should never be used directly
>>>> (they are defined by every CPU architecture
>>>> to be expanded by the generic macros
>>>> _ISR_Disable()/_ISR_Enable() which add a compiler
>>>> memory barrier).
>>>>
>>>>
>>>>
>>> And are made publicly available
>>> as rtems_interrupt_disable and
>>> rtems_interrupt_enable. That is the
>>> name that every BSP and device driver
>>> should use. There should be virtually
>>> no code under rtems-XXX/c which references
>>> _CPU_ISR or _ISR_.
>>>
>>> I have made a big edit sweep and will
>>> start to commit once I do a full build and
>>> review the diff.
>>>
>>> --joel
>>>
>>>> -T
>>>>
>>>> Pavel Pisa wrote:
>>>>
>>>>
>>>>> On Thursday 06 September 2007 11:43, Feng, Kate wrote:
>>>>>
>>>>>
>>>>>> Joel Sherrill wrote :
>>>>>>
>>>>>>
>>>>>>> The memory barrier patch was PR 866 and was merged in March
>>>>>>> 2006. It is
>>>>>>> in all 4.7 versions. It first appeared in 4.6.6.
>>>>>>>
>>>>>>>
>>>>>> It looks like it, but RTEMS4.7.x still needs patches.
>>>>>> This is not even fixed in 4.77.99.2.
>>>>>> The memory barrier definitely should be fixed in RTEMS4.7.x
>>>>>> before jumping to RTEMS4.8.
>>>>>>
>>>>>> Suggestions follow, except I hope I do not miss anything
>>>>>> since I came up with this a while ago.
>>>>>>
>>>>>> 1) In cpukit/score/include/rtems/system.h:
>>>>>>
>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile(::: "memory")
>>>>>>
>>>>>> seems to be wrong and misplaced.
>>>>>>
>>>>>> The memory barrier is processor dependent.
>>>>>> For example, the memory barrier for PowerPC is "sync".
>>>>>>
>>>>>> Thus, for PPC, it would seem more functional to place
>>>>>> #define RTEMS_COMPILER_MEMORY_BARRIER() asm volatile("sync":::
>>>>>> "memory")
>>>>>>
>>>>>> in cpukit/score/cpu/powerpc/system.h
>>>>>> or somewhere in the processor branch.
>>>>>>
>>>>>>
>>>>> Hello Kate and others,
>>>>>
>>>>> I would like to react there, because I think, that proposed
>>>>> addition of "sync" is move into really bad direction.
>>>>>
>>>>> RTEMS_COMPILER_MEMORY_BARRIER is and should remain what it
>>>>> is, I believe. It is barrier against compiler optimizer
>>>>> caused reordering of instruction over the barrier.
>>>>> This does not try to declare/cause any globally visible
>>>>> ordering guarantee, by name and anything else.
>>>>>
>>>>> Each architecture 'X' conforming CPU has to guarantee,
>>>>> that even after complex CPU level instruction reordering,
>>>>> register renaming and transfers delaying an sequence
>>>>> of instruction would result in same state (all viewed
>>>>> from CPU POV) as if instructions has been processed
>>>>> in sequential order one by one.
>>>>> This does not mean anything about external memory transfers
>>>>> order at all (at least for PPC, there are some special rules
>>>>> for x86 caches for compatibility with old programs).
>>>>>
>>>>> The macro ensures only ordering of memory transfers
>>>>> from actual CPU POV/perspective. But this is enough
>>>>> even for POV of normal mode and consecutively invoked
>>>>> exception handler working with same data.
>>>>> Even if exception handler starts and CPU does not finish
>>>>> transfers caused by previously initiated operations, reads
>>>>> from exception on same!!! CPU would read back data from
>>>>> write buffer if the address corresponds to previously written
>>>>> data. So preemption or CPU IRQ flags manipulation in scope
>>>>> of the actual CPU does not need enforcing ordering of real
>>>>> memory by very expensive "sync" instruction. It only needs to
>>>>> be sure, that CPU accounts/is aware of the value write transfer
>>>>> before at correct point in the instruction sequence.
>>>>>
>>>>> On the other hand, there could be other reasons and situations
>>>>> requesting correct ordering of externally visible transfers.
>>>>> For example, if IRQ controller is mapped as peripheral into
>>>>> external memory/IO space and CPU IRQ is disabled, than some
>>>>> mask is changed in the controller to disable one of external
>>>>> sources and it is expected, that after IRQ enabling on CPU
>>>>> level there cannot arrive event from that source, ordering
>>>>> of reads and writes to the controller has to be synchronized
>>>>> with CPU ("eieio" has to be used in the PPC case). But it
>>>>> is not task for CPU level IRQ state manipulation. The ordering
>>>>> should and in RTEMS case is ensured by IO access routines
>>>>> which include "eieio" instruction. On the other hand, if
>>>>> some external device is accessed through overlay structures
>>>>> (even volatile), then ordering could be broken without
>>>>> explicitly inserted "eieio".
>>>>> Other legitimate requirement for strict ordering/barrier for
>>>>> external accesses are the cases, where external device/DMA/coprocessor
>>>>> accesses/shares data in system/main memory with CPU.
>>>>> The "sync"/cache range invalidation/flushing is required before
>>>>> and after external memory accesses (the exact details depend on
>>>>> transfers directions and other parameters).
>>>>>
>>>>>
>>>>>
>>>>>> 3) Among PPC shared/irq/irq.c and other PPCs,
>>>>>> _ISR_Disable( _level ), and _ISR_Disable( _level )
>>>>>> should be used instead of _CPU_ISR_Disable(level) and
>>>>>> _CPU_ISR_Enable( _level )
>>>>>>
>>>>>>
>>>>> But I fully agree with you, that sequences like following
>>>>> one are fatally broken
>>>>>
>>>>> _CPU_ISR_Disable(level);
>>>>> *irq = rtems_hdl_tbl[irq->name];
>>>>> _CPU_ISR_Enable(level);
>>>>>
>>>>> There is no guarantee, that operation which has been expected
>>>>> to be protected would not be moved outside of protection sequence.
>>>>> Explicit or implicit RTEMS_COMPILER_MEMORY_BARRIER is missing there.
>>>>>
>>>>>
>>>>>
>>>>>> Actually, I think a better one should be
>>>>>> rtems_interrupt_disable(level)
>>>>>> and rtems_interrupt_enable(level).
>>>>>>
>>>>>>
>>>>> The code should be changed according to one of your proposals.
>>>>>
>>>>>
>>>>>
>>>>>> 2) In order for the inline to work, the
>>>>>> CPU_INLINE_ENABLE_DISPATCH should be defined to be TRUE.
>>>>>>
>>>>>> Thus,
>>>>>> in cpukit/score/cpu/powerpc/rtems/score/cpu.h:
>>>>>>
>>>>>> -#define CPU_INLINE_ENABLE_DISPATCH FALSE
>>>>>> +#define CPU_INLINE_ENABLE_DISPATCH TRUE
>>>>>>
>>>>>>
>>>>> As for the non-inlined version of _Thread_Enable_dispatch,
>>>>> there should not be problem. Calling function without static
>>>>> or inline attributes is considered as full compiler memory ordering
>>>>> barrier point. So no explicit compiler barrier should be needed
>>>>> there.
>>>>>
>>>>> All this is based upon my understanding of code and computer
>>>>> systems principles. There is no doubt, that there could be
>>>>> many other problems and errors. But if there are problems
>>>>> with IRQs behavior on PPC, then the checking, that sequences
>>>>> like above one do not exist. The _ISR_Disable()/_ISR_Disable()
>>>>> or higher level rtems_ variants should be used in noticed source
>>>>> file. Else bad things could happen.
>>>>>
>>>>> Excuse me for long answer, but I wanted to clarify things
>>>>> as good way as I could.
>>>>>
>>>>> Best wishes
>>>>>
>>>>> Pavel
>>>>> _______________________________________________
>>>>> rtems-users mailing list
>>>>> rtems-users at rtems.com
>>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>>
>>>>>
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.com
>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>
>>>>
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>
>
More information about the users
mailing list