anybody using RTEMS on SH4?

Joel Sherrill joel.sherrill at oarcorp.com
Fri Sep 14 18:12:35 UTC 2007


Nickolay Kolchin wrote:
> Thank you for your suggestions.
>
> On 9/14/07, *Joel Sherrill* <joel.sherrill at oarcorp.com 
> <mailto:joel.sherrill at oarcorp.com>> wrote:
>
>
>     I don't have an SH to try anything on so am only going to offer some
>     general ideas:
>
>     + Is 4.x using double precision and 3.4.6 using single precision?
>
>
> No.  AFAIK, single precision must be explicitly toggled in both 
> compilers: -m4-single.
>
Could you check the generated code to be sure?
>
>     + Cache settings change somehow? Maybe gcc 4.x is optimizing
>        some critical setting out of the BSP initialization. 
>
>
> I'm currently investigating this issue. But probably not, because most 
> cache initialization is mostly inside "asm volatile" statements.
>
>     + If there a change in the array indexing code?  There are options
>         to control multiply and division for the SH so I am curious. 
>
>
> I can be wrong, but those changes mostly apply to FPU less SH4 models.
>
>     + Does it get better or worse when -Os is used?  Or -O2 with no
>        particular options?
>
>
> Worse in both cases. I can post numbers if you are interested.
No need.  Just eliminating a possibility.
>
>     + Is the BSP compiled with the old compiler or new?  I am curious
>         if it is possible to compile the benchmarks with the new compiler
>         and leave the rest of the system alone.  This would eliminate
>         something weird happening to the RTEMS code in the new compiler. 
>
>
> I tried different variants: RTEMS 4.6 compiled with 3.4.6 / 
> application compiled and linked with 4.3.0, RTEMS 4.7 compiled with 
> 4.3.0/application compiled and linked with 3.4.6. Results vary, but 
> application compiled with 3.4.6 always show better performance. 
> Currently I can't explain why application compiled under 3.4.6, run 
> slowly under RTEMS 4.7 (we really need some profiling utilities for 
> RTEMS).
>
If I am following the cases correctly, it looks like...

the application compiled with gcc 4.3 is slower even with an RTEMS that 
is compiled with
the old compiler.  The same RTEMS binary with a 3.4.6 compiled 
application is OK.

If that's true, then we have eliminated any change in compilation of 
RTEMS as a factor.

Can you take the 4.7 newlib+RTEMS patches and try it with gcc 3.4.6?  
That would let
you have  an RTEMS 4.7 with the gcc we think is better.  Use a 3.4.6 SH 
toolchain.
>
>     Nickolay Kolchin wrote:
>     > Hi,
>     >
>     > We have a performance problem on SH4 with gcc4.x.
>     >
>     > SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark
>     > ================================================================
>     >            GCC: 3.4.6   4.2.1   4.3.0 (20070907)
>     >      Composite:  6.05    5.01    4.82
>     >            FFT:  4.90    4.15    4.21
>     >            SOR: 10.10    8.36     7.64
>     >     MonteCarlo:  3.68    3.06    3.04
>     > Sparse matmult:  5.45    4.45    4.03
>     >             LU:  6.10    5.03    5.18
>     > ================================================================
>     >
>     > BYTEmark* Native Mode Benchmark ver. 2 (10/95)
>     > ================================================================
>     >              GCC:      3.4.6      4.2.1  4.3.0 (20070907)
>     >     NUMERIC SORT:     35.459       32.2      29.327
>     >      STRING SORT:     0.5943    0.57604      0.8603
>     >         BITFIELD: 1.0585e+07  9.269e+06  9.4138e+06
>     >     FP EMULATION:     4.4944     4.6012       5.364
>     >          FOURIER:     272.28     241.34      259.12
>     >       ASSIGNMENT:    0.35997    0.38373     0.39683
>     >             IDEA:     124.11     95.057      100.07
>     >          HUFFMAN:     45.593     52.083      56.391
>     >       NEURAL NET:     0.36153    0.30922     0.31348
>     > LU DECOMPOSITION:     11.331     9.4938       8.255
>     > ================================================================
>     >
>     > The "real world application" has 20%-200% performance regression
>     with
>     > GCC 4.x.
>     >
>     > This effectively prevents us from moving to RTEMS 4.7 from 4.6.
>     >
>     > I've reported this issue to gcc bugzilla:
>     > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431
>     > <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431>
>     >
>     > But SH4 backend maintainer Kazumoto Kojima, was unable to
>     reproduce it
>     > under linux-sh:
>     > ================================================================
>     >                        
>     gcc-3.4.6    gcc-4.2.1    gcc-4.3.0(20070910)
>     > Composite Score:            16.76         16.86        16.99
>     > FFT              Mflops:    12.92        13.36        13.36
>     > SOR              Mflops:     27.88        26.76        28.01
>     > MonteCarlo:      Mflops:     9.96         9.73         9.67
>     > Sparse matmult   Mflops:    14.95        16.06        14.84
>     > LU               Mflops:     18.08        18.39        19.05
>     > ================================================================
>     >
>     > Maybe, somebody is also using RTEMS on SH4 and can confirm my or
>     > Kojima results?
>     >
>
>
> ---
> Nickolay




More information about the users mailing list