anybody using RTEMS on SH4?
Nickolay Kolchin
nbkolchin at gmail.com
Fri Sep 14 18:01:22 UTC 2007
Thank you for your suggestions.
On 9/14/07, Joel Sherrill <joel.sherrill at oarcorp.com> wrote:
>
>
> I don't have an SH to try anything on so am only going to offer some
> general ideas:
>
> + Is 4.x using double precision and 3.4.6 using single precision?
No. AFAIK, single precision must be explicitly toggled in both compilers:
-m4-single.
+ Cache settings change somehow? Maybe gcc 4.x is optimizing
> some critical setting out of the BSP initialization.
I'm currently investigating this issue. But probably not, because most cache
initialization is mostly inside "asm volatile" statements.
+ If there a change in the array indexing code? There are options
> to control multiply and division for the SH so I am curious.
I can be wrong, but those changes mostly apply to FPU less SH4 models.
+ Does it get better or worse when -Os is used? Or -O2 with no
> particular options?
Worse in both cases. I can post numbers if you are interested.
+ Is the BSP compiled with the old compiler or new? I am curious
> if it is possible to compile the benchmarks with the new compiler
> and leave the rest of the system alone. This would eliminate
> something weird happening to the RTEMS code in the new compiler.
I tried different variants: RTEMS 4.6 compiled with 3.4.6 / application
compiled and linked with 4.3.0, RTEMS 4.7 compiled with 4.3.0/application
compiled and linked with 3.4.6. Results vary, but application compiled with
3.4.6 always show better performance. Currently I can't explain why
application compiled under 3.4.6, run slowly under RTEMS 4.7 (we really need
some profiling utilities for RTEMS).
Nickolay Kolchin wrote:
> > Hi,
> >
> > We have a performance problem on SH4 with gcc4.x.
> >
> > SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark
> > ================================================================
> > GCC: 3.4.6 4.2.1 4.3.0 (20070907)
> > Composite: 6.05 5.01 4.82
> > FFT: 4.90 4.15 4.21
> > SOR: 10.10 8.36 7.64
> > MonteCarlo: 3.68 3.06 3.04
> > Sparse matmult: 5.45 4.45 4.03
> > LU: 6.10 5.03 5.18
> > ================================================================
> >
> > BYTEmark* Native Mode Benchmark ver. 2 (10/95)
> > ================================================================
> > GCC: 3.4.6 4.2.1 4.3.0 (20070907)
> > NUMERIC SORT: 35.459 32.2 29.327
> > STRING SORT: 0.5943 0.57604 0.8603
> > BITFIELD: 1.0585e+07 9.269e+06 9.4138e+06
> > FP EMULATION: 4.4944 4.6012 5.364
> > FOURIER: 272.28 241.34 259.12
> > ASSIGNMENT: 0.35997 0.38373 0.39683
> > IDEA: 124.11 95.057 100.07
> > HUFFMAN: 45.593 52.083 56.391
> > NEURAL NET: 0.36153 0.30922 0.31348
> > LU DECOMPOSITION: 11.331 9.4938 8.255
> > ================================================================
> >
> > The "real world application" has 20%-200% performance regression with
> > GCC 4.x.
> >
> > This effectively prevents us from moving to RTEMS 4.7 from 4.6.
> >
> > I've reported this issue to gcc bugzilla:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431
> > <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431>
> >
> > But SH4 backend maintainer Kazumoto Kojima, was unable to reproduce it
> > under linux-sh:
> > ================================================================
> > gcc-3.4.6 gcc-4.2.1 gcc-4.3.0(20070910)
> > Composite Score: 16.76 16.86 16.99
> > FFT Mflops: 12.92 13.36 13.36
> > SOR Mflops: 27.88 26.76 28.01
> > MonteCarlo: Mflops: 9.96 9.73 9.67
> > Sparse matmult Mflops: 14.95 16.06 14.84
> > LU Mflops: 18.08 18.39 19.05
> > ================================================================
> >
> > Maybe, somebody is also using RTEMS on SH4 and can confirm my or
> > Kojima results?
> >
>
---
Nickolay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20070914/76ad0df9/attachment-0001.html>
More information about the users
mailing list