Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Eric Norum norume at aps.anl.gov
Tue Nov 28 15:48:40 UTC 2006


On Nov 28, 2006, at 9:25 AM, Ralf Corsepius wrote:

> On Tue, 2006-11-28 at 09:12 -0600, Eric Norum wrote:
>> In the interests of not delaying 4.7 for another year I suggest that
>> we simply add -fno-strict-aliasing to all gcc invocations.  I don't
>> see anything wrong with this approach in the near term.  As has been
>> pointed out by others, many other kernel development projects have
>> resorted to this technique.
>>
>> I know that Ralf is opposed to this, but I have not heard a reason to
>> convince me.
>
> And I have not seen any bug in RTEMS having been fixed by
> -fno-strict-aliasing.
>
> However, I've seen a lot of people bogusly accusing strict-aliasing  
> for
> code bugs, in general (Outside of RTEMS).
>
> Please folks, please provide cases, so we can go after this. So far  
> this
> has not taken place, instead I've seen several "flare gun" approaches
> having been proposed.
>

This entire discussion was started by a report from Peer Stritzinger  
that code in chain.c/chain.inl was getting mangled by the aggressive  
strict-aliasing rules.  That message seemed to provide all the  
information needed to confirm the problem.
============================================

Description:

There is a problem with compiler optimization flags that manifested  
for us in
rtems-4.6.99.3 on powerpc mpc8xx architecture, but I believe it can
cause problems on any
architecture using gcc-4

It was a very hard to track down problem that manifested itself only if
_Thread_Reset_timeslice()
is called on a Thread_Ready_chain element if:

1. The executing thread is the only one that is able to run.

2. It is the last entry in the chain.

3. It is not the only entry in the chain.

What basically happens in _Thread_Reset_timeslice() is that the
element is extracted and
appended in a row by:

_Chain_Extract_unprotected( &executing->Object.Node );
_Chain_Append_unprotected( ready, &executing->Object.Node );

Since it was at the end of the chain anyway it is extraced and
appended at the same place.

The functions _Chain_Has_only_one_node(), _Chain_Extract_unprotected()
and _Chain_Append_unprotected() are all inlined and the optimizer of
gcc puts all pointers in
the chain datastructures into registers.

Since gcc assumes a default of -fstrict-alaising (already present and
set like this in gcc-3)
this causes problems with the chain datastructures and routines that
violate the strict-
aliasing assumption.

The Problem is caused by the coding trick with the chain headers
doubling as overlapping
front and back sentinels:

rtems-4.6.99.3/cpukit/score/include/rtems/score/chain.h:

struct Chain_Node_struct {
Chain_Node *next;
Chain_Node *previous;
};

typedef struct {
Chain_Node *first;
Chain_Node *permanent_null;
Chain_Node *last;
} Chain_Control;

In order to make this work the Chain_Control struct is cast in two  
ways to a
Chain_Node_struct sometimes start of struct aligned (next mapped on
first and previous
mapped on permanent_null) and somtimes like this:

rtems-4.6.99.3/cpukit/score/inline/rtems/score/chain.inl:

RTEMS_INLINE_ROUTINE Chain_Node *_Chain_Tail(
Chain_Control *the_chain
)
{
return (Chain_Node *) &the_chain->permanent_null;
}

where (next is mapped on permanent_null and previous on last).

In the case mentioned above this causes a mess-up of the chain  
datastructure
that causes much later a NULL pointer dereferenced.

This occured only very rarely on a very busy system since the
preconditions don't happen
often.

I verified the generated powerpc assembler code to be in error and
causing the problem.
Unfortunately the analysis is mostly on paper but let feel free to ask
if you need more details.

When rtems is build with -fno-strict-alaising compiler option:

1. Our system runs happily ever after

2. The powerpc assembler code _Thread_Reset_timeslice() hast been  
verified
as correct by me.

How-To-Repeat:

Very hard.  Best way to verify is to inspect generated powerpc assembler
of _Thread_Reset_timeslice()

Fix:

Add '-fno-strict-aliasing' to cflags when building RTEMS

Regards,
Peer Stritzinger

-- 
Eric Norum <norume at aps.anl.gov>
Advanced Photon Source
Argonne National Laboratory
(630) 252-4793





More information about the users mailing list