Fwd: Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Peer Stritzinger peerst at gmail.com
Tue Nov 21 11:09:21 UTC 2006


Last Post did not work, lets retry:

---------- Forwarded message ----------
From: Peer Stritzinger <peerst at gmail.com>
Date: Nov 17, 2006 6:41 PM
Subject: Problem report: Struct aliasing problem causes
Thread_Ready_Chain corruption in 4.6.99.3
To: rtems-users at rtems.com


Hi,

I know this would belong into the gnats database but I just tried to
submit it there
but got:

"Error, Unparseable reply from gnatsd"

So I suppose probably the problem report report has not been received
and repeat i here:

Category: RTEMS Core (or make_build depending on perspective)

Synopsis: Struct aliasing problem that causes Thread_Ready_chain to be corrupted

Severity: serious

Priority: medium

Class: sw-bug

Originator: Peer Stritzinger

Release: 4.6.99.3

Organization: Dipl.Phys. Peer Stritzinger GmbH

Environment:

   Host: i386, FreeBSD 6.1
Target: MPC850, no BSP with application (hacked to get a bare-bsp lookalike)

bsp_cflg='-mcpu=860 -Dmpc860 -Dmbx860_001b -O4 -fno-keep-inline-functions'
our_cflg='-ggdb3 -D_OLD_EXCEPTIONS'

$base/src/$rtems/configure \
    --enable-rtemsbsp=mbx860_001b \
    --target=powerpc-rtems4.7 \
    --prefix=$base \
    target_alias=powerpc-rtems4.7 \
    --enable-maintainer-mode \
    CFLAGS_FOR_TARGET="$bsp_cflg $our_cflg"

Description:

There is a problem with compiler optimization flags that manifested for us in
rtems-4.6.99.3 on powerpc mpc8xx architecture, but I believe it can
cause problems on any
architecture using gcc-4

It was a very hard to track down problem that manifested itself only if
_Thread_Reset_timeslice()
is called on a Thread_Ready_chain element if:

1. The executing thread is the only one that is able to run.

2. It is the last entry in the chain.

3. It is not the only entry in the chain.

What basically happens in _Thread_Reset_timeslice() is that the
element is extracted and
appended in a row by:

_Chain_Extract_unprotected( &executing->Object.Node );
_Chain_Append_unprotected( ready, &executing->Object.Node );

Since it was at the end of the chain anyway it is extraced and
appended at the same place.

The functions _Chain_Has_only_one_node(), _Chain_Extract_unprotected()
and _Chain_Append_unprotected() are all inlined and the optimizer of
gcc puts all pointers in
the chain datastructures into registers.

Since gcc assumes a default of -fstrict-alaising (already present and
set like this in gcc-3)
this causes problems with the chain datastructures and routines that
violate the strict-
aliasing assumption.

The Problem is caused by the coding trick with the chain headers
doubling as overlapping
front and back sentinels:

rtems-4.6.99.3/cpukit/score/include/rtems/score/chain.h:

struct Chain_Node_struct {
Chain_Node *next;
Chain_Node *previous;
};

typedef struct {
Chain_Node *first;
Chain_Node *permanent_null;
Chain_Node *last;
} Chain_Control;

In order to make this work the Chain_Control struct is cast in two ways to a
Chain_Node_struct sometimes start of struct aligned (next mapped on
first and previous
mapped on permanent_null) and somtimes like this:

rtems-4.6.99.3/cpukit/score/inline/rtems/score/chain.inl:

RTEMS_INLINE_ROUTINE Chain_Node *_Chain_Tail(
Chain_Control *the_chain
)
{
return (Chain_Node *) &the_chain->permanent_null;
}

where (next is mapped on permanent_null and previous on last).

In the case mentioned above this causes a mess-up of the chain datastructure
that causes much later a NULL pointer dereferenced.

This occured only very rarely on a very busy system since the
preconditions don't happen
often.

I verified the generated powerpc assembler code to be in error and
causing the problem.
Unfortunately the analysis is mostly on paper but let feel free to ask
if you need more details.

When rtems is build with -fno-strict-alaising compiler option:

1. Our system runs happily ever after

2. The powerpc assembler code _Thread_Reset_timeslice() hast been verified
as correct by me.

How-To-Repeat:

Very hard.  Best way to verify is to inspect generated powerpc assembler
of _Thread_Reset_timeslice()

Fix:

Add '-fno-strict-aliasing' to cflags when building RTEMS

Regards,
Peer Stritzinger



More information about the users mailing list