More PowerPC cache hijinks (dcbf requires sync?)

Thu Sep 11 21:11:46 UTC 2003

In the particular case I'm looking at, I am writing to buffers
which are shared between the PowerPC core on an MPC8xx and different
devices on the CPM (SMCs, SCCs, SPI port).  Since my buffers are
allocated off of the heap (and thus are cacheable), I need to force
a cache line flush after writing to the buffer but before telling
the other bus master that it can read from the buffer.  (My original
message contained an error, I meant _CPU_cache_flush_1_data_line(),
rather than _CPU_cache_invalidate_1_data_line().)

I assumed that rtems_cache_flush_multiple_data_lines() would always
be called for that purpose; it hadn't occurred to me that someone
might use it to make space in the cache.  So I guess the right thing
is to put the sync instruction in my application code after
rtems_cache_flush_multiple_data_lines() returns.

-Phil

On Thu, Sep 11, 2003 at 12:06:42PM -0700, Craig S. Steele wrote:
> > Looking at _CPU_cache_invalidate_1_data_line()
> >in rtems-4.6.0pre4, I don't see any sync instructions.  Does anyone know
> >of a reason why the sync isn't required?
> The question is: Why are you flushing the cache line?  If you just 
> want to invalidate the line, maybe to free capacity, as the routine 
> name states, DCBF does that.  If you've got something more 
> sophisticated in mind, you probably should be writing your own 
> cache-control sequence tailored to your specific hardware.  sync can 
> be very expensive, and I wouldn't think it would be appropriate for a 
> routine with such generic name as CPU_cache_invalidate_1_data_line() 
> .  Without knowing exactly what you've got in mind, it's hard to 
> provide any useful opinion;  forgive me if the following remarks are 
> irrelevant to your interests.
> 
> If you want to use cached memory to write to a memory-mapped I/O 
> device or communicate with another processor, then yes, in the 
> general case you probably need some kind of synchronization barrier 
> if you have a sequence of stores followed by one or more dcbf 
> instructions.  sync is the heaviest-weight of several barrier 
> instructions available, (eiieio,  isync, sync), and will do the job, 
> but can be very expensive in superscalar PowerPC implementations, 
> don't know about simpler embedded chips, my experience was with the 
> 604.  It's possible for an address-only flush to bypass pending 
> cached stores in a sequence, which can require a sync.  If you're not 
> doing a sequence of stores, but just flushing a single cache line, I 
> doubt that a sync is necessary, since there's nothing to get out of 
> order.  If you're just viewing the cache line from the perspective of 
> a single CPU, for example flushing a line to be able to read a 
> cachable status register, you're unlikely to need a sync, the flush 
> will happen eventually and then you'll get the readback.  The sync is 
> helpful in ordering the sequence of cache-line writebacks or ordering 
> dissimilar memory-bus operations, such as writebacks and address-only 
> cycles (60X bus) that do not require a writeback (e.g., dcbf to an 
> address without a line in cache).  The classes of bus operations that 
> can get out of order depends on the organization of different 
> functional units in a particular processor, simpler processors have 
> less capability to reorder or delay things that would require the 
> global barrier of sync.
> 
> >the consensus seemed to be that using 'dcbf' to flush
> >a cache line should be followed by 'sync' to ensure that the cache
> >was really flushed before the following instructions executed.
> In even the simplest pipelined CPU, unless you sync the memory cycle 
> won't complete before the following instructions execute, but you 
> have to be explicit on what perspective you're viewing the operation 
> from to determine whether this issue is important.  Thinks will 
> happen in-order from the processor's viewpoint, but it may be a some 
> time before the the writeback shows up on the external bus.  If 
> you're only concerned about writes happening quickly or have a 
> critical memory-write sequence ordering, maybe the memory area should 
> not be cachable.
> 
> If you're polling a cached register to find a change, it might be 
> better to use a cache-invalidate cycle initiated by the I/O device 
> containing the register than to hammer the bus repeatedly fetching 
> and flushing, if this is an option for the processor selected. 
> Another odd-ball technique that can be useful is to alias your status 
> register at multiple addresses so you can poll through an address 
> sequence that will fetch different cache lines.  Depending on your 
> application hardware timing this could save a few flushes or 
> pointlessly pollute your cache. :-)
> 
> Cached I/O is tricky, and most generic devices that I've seen, mixing 
> status and control bits carelessly, aren't really suitable for it, 
> since they can't tolerate an unintended cache writeback ("capacity 
> spill") with correct functioning.  (That "GO" bit should read back as 
> zero, for example.)  Putting syncs between every store works, but is 
> probably slower than not caching that area.  On the other hand, if 
> you can pack what you need to be consistent in a single cache line, 
> avoid multiple-writers problems in HW, and tolerate redundant 
> writebacks at unexpected times, your system can be the fastest X on 
> the block :-)  But you need to think more along the lines of a SMP 
> programmer than an S-100 hacker. :-)
> 
> Craig
> 
> At 4:12 PM -0700 9/10/03, Phil Torre wrote:
> >I see in the archives an old thread from 2000 in which this topic was
> >discussed, and the consensus seemed to be that using 'dcbf' to flush
> >a cache line should be followed by 'sync' to ensure that the cache
> >was really flushed before the following instructions executed.
> >
> >(Another issue is:  OK, so the dcbf has executed, but does that guarantee
> >that the flush has actually made it out to memory?)
> >
> >So, it seems like the mpc8xx version of
_CPU_cache_invalidate_1_data_line()
> >needs to do a sync after it does its dcbf.  (Ideally you could do
> >multiple dcbf's in a row with a single sync at the end, but that would
> >require the sync to be inserted into
> >rtems_cache_flush_multiple_data_lines(),
> >which isn't CPU-specific.)  Looking at
_CPU_cache_invalidate_1_data_line()
> >in rtems-4.6.0pre4, I don't see any sync instructions.  Does anyone know
> >of a reason why the sync isn't required?