Rtems Benchmarking

Joel Sherrill joel.sherrill at OARcorp.com
Thu Dec 7 15:02:29 UTC 2000

Eric Norum wrote:
> Peter Mueller wrote:
> >
> > Hi all,
> >
> > I run some of the tm tests on our 68332 board. I want compare the figures
> > with the efi332 board, but a look to the bsp shows that there is no code to
> > get the timer data. John told me that he don't know how the data was
> > produced or who has put the file to the ef332 directory!?
> >
> > Our figures are about 1.5 to 3 times larger compared to the gen68360
> > board. I'm not sure if this is an error in our timing routines. Has anyone
> > experience with 68360 and 68332 CPUs? Does this makes sense?
> >
> > Gen68360: 68360 CPU with 25MHz
> > OurBoard: 68332 CPU with 16MHz.
> >
> > I do not know if the 360 is based on the CPU32 core too, but I think so.
> > Only comparing the benchmark figures to the CPU MHz clock rates would give a
> > factor of about 1.6 but not 3.
> >
> > Any comments?
> >

Thanks for the info Eric.  
> The 68360 is based on a CPU32+ core while the 68332 is based on a CPU32
> core.  Here's a quote from the 68360 User's Manual:
> =====================================
> The CPU32+ core is a CPU32 core with its bus interface unit modified to
> connect directly to the 32-bit IMB and take advantage of the larger bus
> width. Although the original CPU32 core already had a 32-bit internal
> data path and 32-bit arithmetic hardware, its external interface (i.e.,
> to the internal IMB) was 16 bits. The CPU32+ core, however, can operate
> on 32-bit external operands with one bus cycle. This capability allows
> the CPU32+ core to fetch a long-word instruction or two word-length
> instructions in one bus cycle, allowing the internal instruction queue
> to be filled more quickly. The CPU32+ core can also read and write 32-
> bits of data in one bus cycle. The CPU32+ has an additional word in its
> instruction pipeline when fetching from a 32-bit port. When fetching
> from a 16-bit port, this additional word is disabled. The performance of
> the CPU32+ on a 16-bit bus is the same as the CPU32 per-formance.
> ======================================
> So I'd expect quite a bit higher performance from the 68360.  Any
> application which is limited by bus speed will get a 2X boost from the
> wider data path.

2x for memory accesses and 25/16 for general speed.  Looks like 3x is
not that far-fetched. :)

Since the code cache is small/non-existent, the tests will repeatedly
be fetching instructions if nothing else.  Some of the tests are
of context switches so those would be memory bound.  I don't have
a real feel for the memory vs. compute ratio on the tests but
my gut tells me that memory speed dominates RTEMS operations.  There
are all the control structures, contexts, stack pushes/pops, etc.

Plus I know that if you look at the test results for the same
tests on different CPUs, you can see where there are architectural
advantages to that family.  For example, the i386 has fast context
switches but can suffer due to register pressure.  The sparc
has LOTs of registers but suffers when you have to flush 
the register windows during a context switch.

> --
> Eric Norum                                 eric.norum at usask.ca
> Department of Electrical Engineering       Phone: (306) 966-5394
> University of Saskatchewan                 FAX:   (306) 966-5407
> Saskatoon, Canada.

Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available             (256) 722-9985

More information about the users mailing list