more cache manager issues

Till Straumann strauman at SLAC.Stanford.EDU
Tue Oct 31 19:29:52 UTC 2000


Charles-Antoine Gauthier wrote:

> Till Straumann wrote:
> >
> > Browsing through the cache manager implementation of
> > `rtems-ss-20000929' caused me having some more questions
> > / suggestions:
> >
> >  - cache_aligned_malloc() (currently not called by any piece of code)
> > must not
> >      be used. Calling `free' on memory allocated by
> > cache_aligned_malloc() results
> >      in heap corruption.
> >
> > All the other issues apply to the POWERPC architecture:
> >
> >  - rtems_cache_flush_multiple_data_lines() etc: after repeatedly
> >      flushing/invalidating single cache lines (`dcbst', `dcbi', `dcbf')
> >      a `SYNC' instruction _must_ be issued to guarantee that the
> > operations
> >      have completed before returning from the
> > rtems_cache_xxx_multiple_data_lines()
> >      etc. routines. (To enhance performance, the CPU dependent single
> > line operations
> >      should probably be inlined).
> >
> >  - At least the MPC8xx implementation (didn't look too close at the
> > other powerpc cpus)
> >      of _CPU_cache_enable_data() etc. is incorrect. Note that
> > enabling/disabling the cache
> >      does not invalidate / flush the cache (consult the MPC-860 user
> > manual).
> >
> >      The correct way to enable the data cache is as follows:
> >
> >         1)  invalidate the complete data cache
> >         2)  `sync' to make sure the operation has completed
> >         3) enable the data cache
> >
> >      Disabling the data cache should consist of the following steps
> >
> >        1) flush (write back and invalidate) the entire data cache
> >        2) `sync'
> >        3) disable the data cache
> >
> >       Well, there's no clue whether steps 2) are really necessary.
> > Theoretically,
> >       the cache could perform the posted write backs after it has been
> > disabled.
> >       However, I feel that it is safer doing the `sync'. Note that
> > 1) are definitively
> >        required. (On the other hand, I can't really see the benefits of
> > the currently
> >        used `isync' as step 4.)
> >
> > Comments?
> >
> > -- Till.
>
>
> WRT sync, if memory if marked as coherent (haven't looked at my 860
> manual, so I don't know if this applies to that processor), then an
> eieio might be better?

The coherency attribute is, AFAIK, used to maintain coherency in
multiprocessor
systems, which the 860 clearly is not :-)

EIEIO is not enough - it only enforces an ordering of load/stores before and

after EIEIO, i.e. if you code

    *a = 0xdeadbeef;
    *b=0xcaffee;

you can not know which memory location is written first, a or b or if they
are
even written at the same time, whereas

    *a = 0xdeadbeef
    __asm__ __ volatile__("eieio");
    *b = 0xcaffee;

enforces b being written after a (but a has not necessarily been written
when the eieio
completes!). Because the buffer memory we are talking about here is seen by
external
hardware, a `sync' must be used. In the example above, if the external
hardware needs
to access `a', it is _not_ guaranteed that external hardware reads
`0xdeadbeef' from a at the
time EIEIO completes. However, if a SYNC were used instead of EIEIO,
external
hardware will see the 0xdeadbeef value no later than when SYNC completes.

>
>
> Synchronization should be required whether 1 or multiple lines are
> flushed. The synchronization only needs to be done once. Consequently,
> the multiple line version of a function should not call the single line
> version.

Of course.

>
>
> I thought the single line functions were inline. If they are not, they
> should be.

Given the complexity of this entire cache business, I doubt that the cache
manager exporting that much functionality (which, at the moment is
mostly unimplemented anyway) makes really sense. IMHO, it would be enough
to have (like linux does) few routines to provide basic functionality for
flushing (and maybe invalidating) date/instruction cache ranges. These
routines would then have to be implemented entirely by the CPU support.

I don't see the advantages of having a general routine which inlines CPU
dependent
`single line' stuff. It's much more inflexible to have to glue architecture
dependent
pieces into a general routine (which is not more than a loop) than simply
coding
the entire routine (again, that's the way linux does it).

-- Till

PS:
on the PPC the `flush' business is as simple as this (maybe we should
even rename this to `write_dcache_range':

/* write (but do not invalidate) a range of data cache lines to memory
  * (NOTE: our terminology of `flush' is different from the
  * PPC instruction `dcbf' where `flush'ing means `write
  * and invalidate.
  */
void
flush_dcache_range(char *start, int len)
{
  len--; /* convert 'len' to offset of the final address */
  while (len>=0) {
    __asm__ __volatile__("dcbst %0,%1"::"r"(start),"r"(len));
    len-=CACHE_LINE_SIZE;
  }
  /* we assume this routine is not too often called with `len' <=0 */
  __asm__ __volatile__("sync");
}




More information about the users mailing list