Thank you for your suggestions.<br><br><div><span class="gmail_quote">On 9/14/07, <b class="gmail_sendername">Joel Sherrill</b> <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>I don't have an SH to try anything on so am only going to offer some<br>general ideas:
<br><br>+ Is 4.x using double precision and 3.4.6 using single precision?</blockquote><div><br>No. AFAIK, single precision must be explicitly toggled in both compilers: -m4-single.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
+ Cache settings change somehow? Maybe gcc 4.x is optimizing<br> some critical setting out of the BSP initialization. </blockquote><div><br>I'm currently investigating this issue. But probably not, because most cache initialization is mostly inside "asm volatile" statements.
<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">+ If there a change in the array indexing code? There are options<br> to control multiply and division for the SH so I am curious.
</blockquote><div><br>I can be wrong, but those changes mostly apply to FPU less SH4 models.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
+ Does it get better or worse when -Os is used? Or -O2 with no<br> particular options?</blockquote><div><br>Worse in both cases. I can post numbers if you are interested. <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
+ Is the BSP compiled with the old compiler or new? I am curious<br> if it is possible to compile the benchmarks with the new compiler<br> and leave the rest of the system alone. This would eliminate<br> something weird happening to the RTEMS code in the new compiler.
</blockquote><div><br>I tried different variants: RTEMS 4.6 compiled with 3.4.6 / application compiled and linked with 4.3.0, RTEMS 4.7 compiled with 4.3.0/application compiled and linked with 3.4.6. Results vary, but application compiled with
3.4.6 always show better performance. Currently I can't explain why application compiled under 3.4.6, run slowly under RTEMS 4.7 (we really need some profiling utilities for RTEMS).<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Nickolay Kolchin wrote:<br>> Hi,<br>><br>> We have a performance problem on SH4 with gcc4.x.<br>><br>> SciMark2 Numeric Benchmark, see <a href="http://math.nist.gov/scimark">http://math.nist.gov/scimark</a>
<br>> ================================================================<br>> GCC: 3.4.6 4.2.1 4.3.0 (20070907)<br>> Composite: 6.05 5.01 4.82<br>> FFT: 4.90 4.15 4.21
<br>> SOR: 10.10 8.36 7.64<br>> MonteCarlo: 3.68 3.06 3.04<br>> Sparse matmult: 5.45 4.45 4.03<br>> LU: 6.10 5.03 5.18<br>> ================================================================
<br>><br>> BYTEmark* Native Mode Benchmark ver. 2 (10/95)<br>> ================================================================<br>> GCC: 3.4.6 4.2.1 4.3.0 (20070907)<br>> NUMERIC SORT:
35.459 32.2 29.327<br>> STRING SORT: 0.5943 0.57604 0.8603<br>> BITFIELD: 1.0585e+07 9.269e+06 9.4138e+06<br>> FP EMULATION: 4.4944 4.6012 5.364<br>> FOURIER:
272.28 241.34 259.12<br>> ASSIGNMENT: 0.35997 0.38373 0.39683<br>> IDEA: 124.11 95.057 100.07<br>> HUFFMAN: 45.593 52.083 56.391<br>> NEURAL NET:
0.36153 0.30922 0.31348<br>> LU DECOMPOSITION: 11.331 9.4938 8.255<br>> ================================================================<br>><br>> The "real world application" has 20%-200% performance regression with
<br>> GCC 4.x.<br>><br>> This effectively prevents us from moving to RTEMS 4.7 from 4.6.<br>><br>> I've reported this issue to gcc bugzilla:<br>> <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431">
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431</a><br>> <<a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431">http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33431</a>><br>><br>> But SH4 backend maintainer Kazumoto Kojima, was unable to reproduce it
<br>> under linux-sh:<br>> ================================================================<br>> gcc-3.4.6 gcc-4.2.1 gcc-4.3.0(20070910)<br>> Composite Score: 16.76
16.86 16.99<br>> FFT Mflops: 12.92 13.36 13.36<br>> SOR Mflops: 27.88 26.76 28.01<br>> MonteCarlo: Mflops: 9.96 9.73 9.67
<br>> Sparse matmult Mflops: 14.95 16.06 14.84<br>> LU Mflops: 18.08 18.39 19.05<br>> ================================================================<br>><br>
> Maybe, somebody is also using RTEMS on SH4 and can confirm my or<br>> Kojima results?<br>><br></blockquote></div><br>---<br>Nickolay<br>