Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Thomas Doerfler Thomas.Doerfler at embedded-brains.de
Wed Nov 29 14:31:53 UTC 2006


Ralf,

Ralf Corsepius schrieb:
> On Wed, 2006-11-29 at 08:49 +0100, Thomas Doerfler wrote:
> 
>> Ralf Corsepius schrieb:
>>>
>>> * HUGE projects such as Fedora and OpenSuSE are able to compile 1000's
>>> of source tarballs and millions of lines of code with it enabled and are
>>> only facing very few packages to break?
>>>
>>> * GCC and newlib can be compiled with it enabled for RTEMS?
>> Oh, please note: the RTEMS kernel and packages can also be compiled with
>> this option. And they also work MOST of the time. But this is not
>> sufficent for a reliable RTOS.
> Are you seriously trying to say, such fundamental bugs would not be
> found in those 1000's of source tarballs, in all those _years_
> -fstrict-aliasing is effective, if this was a real problem?

No, I would not dare to set such a silly statement.
Just for the records:

- Software that has been designed in a "cleaner" way concerning its data
structures and their usage surely has no problems with the strict
aliasing rules.

- RTEMS definitively has problems.

- Some people on the RTEMS list have stated, that they are not sure
whether their application/BSP/OS code will always honor the strict
aliasing rules.

> 
> Unfortunately RTEMS is one of these!

Yes.

> 
>> What is your suggestion to find other potential problem areas?
> I can tell you what I've been trying so far (but I am at just at the
> very beginning):
> 
> Compile RTEMS with and with out -fno-strict-aliasing, disassemble the
> object files and compare the disassembly. If these disassembled files
> differ, this a files is qualified to be candidate to be examined.

This is a good aproach. It will show us, which modules might be
sensitive for aliasing issues.

> 
> This results into a list of candidate files to be examined (in the order
> of 100). It definitely contains many false positives, due
> -fno-strict-aliasing affecting ordering of asm-instructions,
> nevertheless this list is better than nothing.

Keep in mind: It will only list the modules which generate different
code with the default -fstrict-aliasing on the current GCC version. We
should track this in future releases to ensure, that better optimizers
will not bring up new issues.

> 
>>> Our problem is lack of testing (primary cause: way too long release
>>> cycles). 
>> Here I must totally disagree.
> Face it: RTEMS users are still using ancient tools with ancient version
> of RTEMS, therefore rtems-4.7 and its toolchain has hardly seen any
> public exposure and testing at all.

I think this is partly due to the fact, that RTEMS is used in embedded
devices. When I start a development based on RTEMS, I may be open to use
a non-stable version, but when my product finalizes, I need a stable
version. This may be a big difference compared with most of the other
open source projects.

But you are right, a broader test community (and more snapshots) would
be desirable.

> 
>>  You will never fix this problem by
>> testing. The effort to track down ONE error has been significantly high.
> Yes, and? How many errors are there? 1 ... 10 ... 100s?
> 
> I suspect very few, with most of them orbiting around "Chains" and
> "Object", due to their working principle (based on aliasing types).

How about the network stack, the web server, the filesystem,
malloc/free... and the individual BSPs.

>>>>
>>>>
>>>>> 2.) We set "-fno-strict-aliasing" now and forever
>>>
>>> With all due respect, but to me, this would be "plain stupid".
>> Ralf, again with due respect, can you please explain me why it is stupid?
> 
> The ... forever ... is stupid. 
> 
> RTEMS code is dirty and needs to be cleaned up, that's the point.

Agreed. I also would like to have strict aliasing-proof code as a future
goal. And I see that setting -fno-strict-aliasing temporarily will put
the pressure out of this goal (which is a benefit for the users but a
bad thing to reach this goal).

> 
>> Ralf, I agree with you that it would be nicer to have aliasing-proof code
>> from the start, but I see no easy way to get it soon.
> Therefore _temporary_, therefore NO -fno-strict-aliasing in rtems-4.8.

Ok, this is sort of a compromise.

> 
> We must provoke these bugs to be able to "nail them down" and not pamper
> them with "-fno-strict-aliasing".

Maybe the following steps would make sense:

- Somebody (Ralf?) might track down the suspect modules by Ralfs method
to compare the compiler output (using an archtiecture with MANY
optimization headroom. PPC is not so bad due to its many general purpose
registers, but maybe another architecture is better)

- The various suspect packages could be redesigned by some suitable
persons (I would volunteer for some of the code)

- In parallel, 4.7 will be cut with -fno-strict-aliasing

- the 4.8 development branch will temporarily use -fno-strict-aliasing
aswell, until the code has been revised

- Then, the 4.8 development branch will switch back to -fstrict-aliasing
AND enable more aliasing warnings (there was some GCC switch to do this)

What do all of you think of this?

wkr,
Thomas.



> 
> Ralf
> 
> 
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users


-- 
--------------------------------------------
embedded brains GmbH
Thomas Doerfler           Obere Lagerstr. 30
D-82178 Puchheim          Germany
Tel. : +49-89-18 90 80 79-2
Fax  : +49-89-18 90 80 79-9
email: Thomas.Doerfler at embedded-brains.de
PGP public key available on request



More information about the users mailing list