Problem report: Struct aliasing problem causes Thread_Ready_Chain corruption in 4.6.99.3

Eric Norum norume at aps.anl.gov
Tue Nov 28 17:44:56 UTC 2006


I didn't see this in your message, but I presume that the unpatched  
version generates correct code if the compiler is invoked with -fno- 
strict-aliasing, right?
On Nov 28, 2006, at 11:35 AM, Thomas Doerfler wrote:

> Ralf,
>
> I have compiled RTEMS-4.7.branch fpr the MBX821_001b BSP with and
> without the patch I posted some days ago. I have analyzed the assembly
> code for the function "threadresettimeslice.c" and (Ralf this  
> should be
> interesting for you) found that the unpatched version generates  
> WRONG code.
>
> This is going to get lengthy, but it is worth writing:
>
> Here is the function body of thread_reset_timeslice:
>
> =======================================
> void _Thread_Reset_timeslice( void )
> {
>   ISR_Level       level;
>   Thread_Control *executing;
>   Chain_Control  *ready;
>
>   executing = _Thread_Executing;
>   ready     = executing->ready;
>   _ISR_Disable( level );
>     if ( _Chain_Has_only_one_node( ready ) ) {
>       _ISR_Enable( level );
>       return;
>     }
> /**** COMMENT: these two lines generate nonfunctional code *****/
>     _Chain_Extract_unprotected( &executing->Object.Node );
>     _Chain_Append_unprotected( ready, &executing->Object.Node );
> /**** COMMENT: end of critical lines *****/
>
>   _ISR_Flash( level );
>
>     if ( _Thread_Is_heir( executing ) )
>       _Thread_Heir = (Thread_Control *) ready->first;
>
>     _Context_Switch_necessary = TRUE;
>
>   _ISR_Enable( level );
> }
> =======================================
>
> The inlined functions "_Chain_Extract_unprotected" and
> "_Chain_Append_unprotected" look like this:
>
> =======================================
> RTEMS_INLINE_ROUTINE void _Chain_Extract_unprotected(
>   Chain_Node *the_node
> )
> {
>   Chain_Node *next;
>   Chain_Node *previous;
>
>   next           = the_node->next;
>   previous       = the_node->previous;
>   next->previous = previous;
>   previous->next = next;
> }
>
> RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected(
>   Chain_Control *the_chain,
>   Chain_Node    *the_node
> )
> {
>   Chain_Node *old_last_node;
>
>   the_node->next      = _Chain_Tail(the_chain);
>   old_last_node       = the_chain->last;
>   the_chain->last     = the_node;
>   old_last_node->next = the_node;
>   the_node->previous  = old_last_node;
> }
> =======================================
>
> So they are basic chain/unchain operations.
>
> Here is, what the compiler created from this source code and my  
> comments
> to it:
>
> ============== start of non-functional version ==================
>  	lwz     r6,8(r5);  r6 = ready->last (=old_last_node);
>  	lwz     r0,0(r5);  r0 = ready->first;
>  	addi    r4,r5,4 ;  r4 = &(ready->premanent_null)
>  	cmpw    cr7,r0,r6
>  	bne-    cr7,48 <_Thread_Reset_timeslice+0x48>
>  	mtmsr   r7
>  	blr                ; r8==executing, r5==ready
> ; extract starts here
>  	lwz     r11,0(r8)  ; r11=executing->next
>  	lwz     r9,4(r8)   ; r9 =executing->previous
>  	stw     r11,0(r9)  ; executing->prev->next=executing->next
>  	stw     r9,4(r11)  ; executing->next->prev=executing->prev
> ; append starts here
>  	stw     r4,0(r8)   ; executing->next=&(ready->premanent_null)
>  	stw     r8,8(r5)   ; ready->last = executing
>  	stw     r6,4(r8)   ; executing->prev=old_last_node(taken LONG ago)
>  	stw     r8,0(r6)   ; old_last_node->next=executing
>  	mtmsr   r7
>  	andc    r10,r7,r10
>  	mtmsr   r10
> ============== end of non-functional version ==================
>
> Please note: the above code reads "ready->last" to "old_last_node 
> (=r6)"
> once at the top of this snipplet, and the compiler has no idea,  
> that the
> node chaining operations below might modify "read->last", because  
> *ready
> is not of type "Chain_Node".
>
> What might happen:
> executing is the last node in the chain, so executing-next points to
> ready and the line commented as "executing->next->prev=executing- 
> >prev"
> modifies in fact "ready->last", but later on, the code still uses the
> (stale) copy of ready->last located in register r6. And THIS IS WRONG.
>
> Here is the code generated with my patches (and still with
> strict-aliasing ON)
>
> ============== start of fixed version ==================
>  	lwz     r9,0(r5);  r9 = ready->first;
>  	lwz     r0,8(r5);  r0 = ready->last (=old_last_node);
>  	addi    r4,r5,4
>  	cmpw    cr7,r9,r0
>  	bne-    cr7,48 <_Thread_Reset_timeslice+0x48>
>  	mtmsr   r6
>  	blr
> ; extract starts here
>  	lwz     r11,0(r8)  ; r11=executing->next
>  	lwz     r9,4(r8)   ; r9 =executing->previous
>  	stw     r9,4(r11)  ; executing->next->prev=executing->prev
>  	stw     r11,0(r9)  ; executing->prev->next=executing->next
> ; append starts here
>  	lwz     r10,8(r5)  ; old_last_node=ready->last
>  	stw     r4,0(r8)   ; executing->next=&(ready->premanent_null)
>  	stw     r8,8(r5)   ; ready->last = executing
>  	stw     r8,0(r10)  ; old_last_node->next=executing
>  	stw     r10,4(r8)  ; executing->prev=old_last_node
>  	mtmsr   r6
>  	andc    r7,r6,r7
>  	mtmsr   r7
> ============== end of fixed version ==================
>
> Here you can see, that "old_last_node" is taken at the proper time.
>
> So, I think I could prove that the current GCC miss-interprets and
> mis-optimizes RTEMS chain code in a rather subtile way.
> The current code/compiler adaptation is broken. And we have just
> analyzed ONE function.
>
> by the way: my patch (overlaying Chain_Control with two Chain_node
> structures and accessing Chain_Control through these structures) is  
> not
> yet clean enough to be integrated into RTEMS, because, as Pavel has
> pointed out, it will have alignment problems on some machines.
>
>
> wkr,
> Thomas.
>
>
> Ralf Corsepius schrieb:
>> On Tue, 2006-11-28 at 09:12 -0600, Eric Norum wrote:
>>> In the interests of not delaying 4.7 for another year I suggest that
>>> we simply add -fno-strict-aliasing to all gcc invocations.  I don't
>>> see anything wrong with this approach in the near term.  As has been
>>> pointed out by others, many other kernel development projects have
>>> resorted to this technique.
>>>
>>> I know that Ralf is opposed to this, but I have not heard a  
>>> reason to
>>> convince me.
>>
>> And I have not seen any bug in RTEMS having been fixed by
>> -fno-strict-aliasing.
>>
>> However, I've seen a lot of people bogusly accusing strict- 
>> aliasing for
>> code bugs, in general (Outside of RTEMS).
>>
>> Please folks, please provide cases, so we can go after this. So  
>> far this
>> has not taken place, instead I've seen several "flare gun" approaches
>> having been proposed.
>>
>> Ralf
>>
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>
>
> -- 
> --------------------------------------------
> embedded brains GmbH
> Thomas Doerfler           Obere Lagerstr. 30
> D-82178 Puchheim          Germany
> Tel. : +49-89-18 90 80 79-2
> Fax  : +49-89-18 90 80 79-9
> email: Thomas.Doerfler at embedded-brains.de
> PGP public key available on request

-- 
Eric Norum <norume at aps.anl.gov>
Advanced Photon Source
Argonne National Laboratory
(630) 252-4793





More information about the users mailing list