Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Ralf Corsepius ralf.corsepius at rtems.org
Thu Nov 30 04:50:12 UTC 2006


On Wed, 2006-11-29 at 14:33 -0600, Joel Sherrill wrote: 
> Ralf Corsepius wrote:
> > On Wed, 2006-11-29 at 10:05 -0600, Joel Sherrill wrote:
> >   
> >> Thomas Doerfler wrote:
> >>     
> >>> Ralf Corsepius schrieb:
> >>>       
> >>>> On Wed, 2006-11-29 at 08:49 +0100, Thomas Doerfler wrote:
> >>>>         
> >>>>> Ralf Corsepius schrieb:

> >> So those 1000s of packages referred to are likely not even to get close to
> >> problems with aliasing. 
> >>     
> > Check X11, gtk, qt, they are full of it (I recall one point, many years
> > ago, strict-aliasing had broken X11).
> >
> >   
> And all of those examples are graphics packages likely using the same 
> type of memory overlay tricks as RTEMS.
I used them as examples due to their HUGE user base and popularity.
Also I am using at least X11/Xm/Xt and gtk as a developer for ca. a
decade and therefore am familiar with some problems they have.

>  Just like RTEMS, the original X11 code base very old.
Yes, at least X11 and its predecessors X10 and project Athena reach back
to the 1980's. gtk and qt originate sometime in the mid/late 1990's.

BTW: Now that Chris mentioned "restrict" on a parallel thread - FreeBSD
extensively uses __restrict" (__restrict is BSD's wrapper define around
restrict), for quite a while. I am inclined to consider this as an
indication for them being aware about this issue ;)

> I'm not trying to apologize
And I do not accuse anybody - I only say it's broken, probably due to
bad design.

Me suspecting "chains etc" has a personal background: One of my own
packages once (ca. in 2000) broke due issues with strict-aliasing in its
"custom linked list implementation".

> >>>>> What is your suggestion to find other potential problem areas?
> >>>>>       
> >>>>>           
> >>>> I can tell you what I've been trying so far (but I am at just at the
> >>>> very beginning):
> >>>>
> >>>> Compile RTEMS with and with out -fno-strict-aliasing, disassemble the
> >>>> object files and compare the disassembly. If these disassembled files
> >>>> differ, this a files is qualified to be candidate to be examined.
> >>>>     
> >>>>         
> >>> This is a good aproach. It will show us, which modules might be
> >>> sensitive for aliasing issues.
> >>>
> >>>   
> >>>       
> >> This is definitely a good approach. 
> >>     
> > ATM, I am at ca 300 suspected files ;)
> >
> >   
> I count 2200 in the entire tree and 1740 if you ignore tests.  So you 
> narrowed it down considerably.
I am using per-BSP builds with libtests enabled. This catches all
"commonly built/shared cases", lets bsp-specific issues accumulate.

> >> Ralf's diff'ing of assembly at least narrows down the candidates 
> >> significantly.
> >>     
> > ATM (without having tried to eliminate false positive) about 1/3 of all
> > *.c files get listed.
> >
> >   
> Ahhh... so you are only looking at cpukit that certainly makes the 
> percentage worse. :(
Nope, cf. above. It's just that I only have tested a very limited subset
and I am far from being through with all BSPs ;)

> >> Hopefully,
> >> we can pick a single CPU to analyze on first that we think is very likely to
> >> have these optimization problems.
> >>     
> > To getting started, I'd suggest to try the posix BSP under Linux.
> > This only uses a very limited part of the RTEMS sources, and uses a
> > native Linux-gcc, which can be assumed to be in far better shape than a
> > standard FSF-gcc.
> >
> > My knowledge on i386 is poor, but unless I am completely wrong,
> > Peer's/Thomas issue is visible under i386-FC6 and Cygwin.
> >
> >   
> I thought their analysis was on PowerPC code.

Yes, but ...
>   But it may show up on x86.
... the code being affect is in "C".

I am almost certain all CPU are affected, because AFAICT strict-aliasing
in GCC is largely CPU-independent.

> >>>> We must provoke these bugs to be able to "nail them down" and not pamper
> >>>> them with "-fno-strict-aliasing".
> >>>>     
> >>>>         
> >>> Maybe the following steps would make sense:
> >>>
> >>> - Somebody (Ralf?) might track down the suspect modules by Ralfs method
> >>> to compare the compiler output (using an archtiecture with MANY
> >>> optimization headroom. PPC is not so bad due to its many general purpose
> >>> registers, but maybe another architecture is better)
> >>>       
> >    
> >   
> >> - Verify difference is a breakage. :)
> >>     
> > That's a real issue. I need to think about how we could try to approach
> > this problem.
> >   
> I cc'ed you on the gcc list asking that person who said they had a patch 
> to improve
> the warnings.  Don't know if it will help us or not but it doesn't hurt 
> to ask.
> 
> It might also be interesting to see if we can get a warning out of the code
> we know breaks. Should gcc have generated a warning?
Having a warning on each and every point GCC knows to exploit
strict-aliasing would be great.

> > I could also add my list to CVS-HEAD, where it could be reformated into
> > a table ...
> >
> >   
> The Wiki should be useful for this.  You want a table and everyone can 
> edit it whether or not they have CVS write access. 
The wiki isn't necessarily suitable for this purpose for me, because I
need it as a local file on disk, to be able to process it as part of
scripts.

Ralf





More information about the users mailing list