More PowerPC cache hijinks (dcbf requires sync?)

Thu Sep 11 19:06:42 UTC 2003

>  Looking at _CPU_cache_invalidate_1_data_line()
>in rtems-4.6.0pre4, I don't see any sync instructions.  Does anyone know
>of a reason why the sync isn't required?
The question is: Why are you flushing the cache line?  If you just 
want to invalidate the line, maybe to free capacity, as the routine 
name states, DCBF does that.  If you've got something more 
sophisticated in mind, you probably should be writing your own 
cache-control sequence tailored to your specific hardware.  sync can 
be very expensive, and I wouldn't think it would be appropriate for a 
routine with such generic name as CPU_cache_invalidate_1_data_line() 
.  Without knowing exactly what you've got in mind, it's hard to 
provide any useful opinion;  forgive me if the following remarks are 
irrelevant to your interests.

If you want to use cached memory to write to a memory-mapped I/O 
device or communicate with another processor, then yes, in the 
general case you probably need some kind of synchronization barrier 
if you have a sequence of stores followed by one or more dcbf 
instructions.  sync is the heaviest-weight of several barrier 
instructions available, (eiieio,  isync, sync), and will do the job, 
but can be very expensive in superscalar PowerPC implementations, 
don't know about simpler embedded chips, my experience was with the 
604.  It's possible for an address-only flush to bypass pending 
cached stores in a sequence, which can require a sync.  If you're not 
doing a sequence of stores, but just flushing a single cache line, I 
doubt that a sync is necessary, since there's nothing to get out of 
order.  If you're just viewing the cache line from the perspective of 
a single CPU, for example flushing a line to be able to read a 
cachable status register, you're unlikely to need a sync, the flush 
will happen eventually and then you'll get the readback.  The sync is 
helpful in ordering the sequence of cache-line writebacks or ordering 
dissimilar memory-bus operations, such as writebacks and address-only 
cycles (60X bus) that do not require a writeback (e.g., dcbf to an 
address without a line in cache).  The classes of bus operations that 
can get out of order depends on the organization of different 
functional units in a particular processor, simpler processors have 
less capability to reorder or delay things that would require the 
global barrier of sync.

>the consensus seemed to be that using 'dcbf' to flush
>a cache line should be followed by 'sync' to ensure that the cache
>was really flushed before the following instructions executed.
In even the simplest pipelined CPU, unless you sync the memory cycle 
won't complete before the following instructions execute, but you 
have to be explicit on what perspective you're viewing the operation 
from to determine whether this issue is important.  Thinks will 
happen in-order from the processor's viewpoint, but it may be a some 
time before the the writeback shows up on the external bus.  If 
you're only concerned about writes happening quickly or have a 
critical memory-write sequence ordering, maybe the memory area should 
not be cachable.

If you're polling a cached register to find a change, it might be 
better to use a cache-invalidate cycle initiated by the I/O device 
containing the register than to hammer the bus repeatedly fetching 
and flushing, if this is an option for the processor selected. 
Another odd-ball technique that can be useful is to alias your status 
register at multiple addresses so you can poll through an address 
sequence that will fetch different cache lines.  Depending on your 
application hardware timing this could save a few flushes or 
pointlessly pollute your cache. :-)

Cached I/O is tricky, and most generic devices that I've seen, mixing 
status and control bits carelessly, aren't really suitable for it, 
since they can't tolerate an unintended cache writeback ("capacity 
spill") with correct functioning.  (That "GO" bit should read back as 
zero, for example.)  Putting syncs between every store works, but is 
probably slower than not caching that area.  On the other hand, if 
you can pack what you need to be consistent in a single cache line, 
avoid multiple-writers problems in HW, and tolerate redundant 
writebacks at unexpected times, your system can be the fastest X on 
the block :-)  But you need to think more along the lines of a SMP 
programmer than an S-100 hacker. :-)

Craig

At 4:12 PM -0700 9/10/03, Phil Torre wrote:
>I see in the archives an old thread from 2000 in which this topic was
>discussed, and the consensus seemed to be that using 'dcbf' to flush
>a cache line should be followed by 'sync' to ensure that the cache
>was really flushed before the following instructions executed.
>
>(Another issue is:  OK, so the dcbf has executed, but does that guarantee
>that the flush has actually made it out to memory?)
>
>So, it seems like the mpc8xx version of _CPU_cache_invalidate_1_data_line()
>needs to do a sync after it does its dcbf.  (Ideally you could do
>multiple dcbf's in a row with a single sync at the end, but that would
>require the sync to be inserted into
>rtems_cache_flush_multiple_data_lines(),
>which isn't CPU-specific.)  Looking at _CPU_cache_invalidate_1_data_line()
>in rtems-4.6.0pre4, I don't see any sync instructions.  Does anyone know
>of a reason why the sync isn't required?
>
>-Phil
>
>--
>
>=====================================================================
>Phil Torre                               phone: 425-820-6363 x234
>Design Engineer                          email: ptorre at zetron.com
>Switching Systems Group                    fax: 425-820-7031
>Zetron, Inc.                               web: http://www.zetron.com
>
>
>