More PowerPC cache hijinks (dcbf requires sync?)
Craig S. Steele
steele at ISI.EDU
Thu Sep 11 19:06:42 UTC 2003
> Looking at _CPU_cache_invalidate_1_data_line()
>in rtems-4.6.0pre4, I don't see any sync instructions. Does anyone know
>of a reason why the sync isn't required?
The question is: Why are you flushing the cache line? If you just
want to invalidate the line, maybe to free capacity, as the routine
name states, DCBF does that. If you've got something more
sophisticated in mind, you probably should be writing your own
cache-control sequence tailored to your specific hardware. sync can
be very expensive, and I wouldn't think it would be appropriate for a
routine with such generic name as CPU_cache_invalidate_1_data_line()
. Without knowing exactly what you've got in mind, it's hard to
provide any useful opinion; forgive me if the following remarks are
irrelevant to your interests.
If you want to use cached memory to write to a memory-mapped I/O
device or communicate with another processor, then yes, in the
general case you probably need some kind of synchronization barrier
if you have a sequence of stores followed by one or more dcbf
instructions. sync is the heaviest-weight of several barrier
instructions available, (eiieio, isync, sync), and will do the job,
but can be very expensive in superscalar PowerPC implementations,
don't know about simpler embedded chips, my experience was with the
604. It's possible for an address-only flush to bypass pending
cached stores in a sequence, which can require a sync. If you're not
doing a sequence of stores, but just flushing a single cache line, I
doubt that a sync is necessary, since there's nothing to get out of
order. If you're just viewing the cache line from the perspective of
a single CPU, for example flushing a line to be able to read a
cachable status register, you're unlikely to need a sync, the flush
will happen eventually and then you'll get the readback. The sync is
helpful in ordering the sequence of cache-line writebacks or ordering
dissimilar memory-bus operations, such as writebacks and address-only
cycles (60X bus) that do not require a writeback (e.g., dcbf to an
address without a line in cache). The classes of bus operations that
can get out of order depends on the organization of different
functional units in a particular processor, simpler processors have
less capability to reorder or delay things that would require the
global barrier of sync.
>the consensus seemed to be that using 'dcbf' to flush
>a cache line should be followed by 'sync' to ensure that the cache
>was really flushed before the following instructions executed.
In even the simplest pipelined CPU, unless you sync the memory cycle
won't complete before the following instructions execute, but you
have to be explicit on what perspective you're viewing the operation
from to determine whether this issue is important. Thinks will
happen in-order from the processor's viewpoint, but it may be a some
time before the the writeback shows up on the external bus. If
you're only concerned about writes happening quickly or have a
critical memory-write sequence ordering, maybe the memory area should
not be cachable.
If you're polling a cached register to find a change, it might be
better to use a cache-invalidate cycle initiated by the I/O device
containing the register than to hammer the bus repeatedly fetching
and flushing, if this is an option for the processor selected.
Another odd-ball technique that can be useful is to alias your status
register at multiple addresses so you can poll through an address
sequence that will fetch different cache lines. Depending on your
application hardware timing this could save a few flushes or
pointlessly pollute your cache. :-)
Cached I/O is tricky, and most generic devices that I've seen, mixing
status and control bits carelessly, aren't really suitable for it,
since they can't tolerate an unintended cache writeback ("capacity
spill") with correct functioning. (That "GO" bit should read back as
zero, for example.) Putting syncs between every store works, but is
probably slower than not caching that area. On the other hand, if
you can pack what you need to be consistent in a single cache line,
avoid multiple-writers problems in HW, and tolerate redundant
writebacks at unexpected times, your system can be the fastest X on
the block :-) But you need to think more along the lines of a SMP
programmer than an S-100 hacker. :-)
Craig
At 4:12 PM -0700 9/10/03, Phil Torre wrote:
>I see in the archives an old thread from 2000 in which this topic was
>discussed, and the consensus seemed to be that using 'dcbf' to flush
>a cache line should be followed by 'sync' to ensure that the cache
>was really flushed before the following instructions executed.
>
>(Another issue is: OK, so the dcbf has executed, but does that guarantee
>that the flush has actually made it out to memory?)
>
>So, it seems like the mpc8xx version of _CPU_cache_invalidate_1_data_line()
>needs to do a sync after it does its dcbf. (Ideally you could do
>multiple dcbf's in a row with a single sync at the end, but that would
>require the sync to be inserted into
>rtems_cache_flush_multiple_data_lines(),
>which isn't CPU-specific.) Looking at _CPU_cache_invalidate_1_data_line()
>in rtems-4.6.0pre4, I don't see any sync instructions. Does anyone know
>of a reason why the sync isn't required?
>
>-Phil
>
>--
>
>=====================================================================
>Phil Torre phone: 425-820-6363 x234
>Design Engineer email: ptorre at zetron.com
>Switching Systems Group fax: 425-820-7031
>Zetron, Inc. web: http://www.zetron.com
>
>
>
More information about the users
mailing list