GNATS-1110: in_cksum_hdr error in PC386 BSP

Joel Sherrill joel.sherrill at oarcorp.com
Wed Jul 5 20:12:52 UTC 2006


Till Straumann wrote:

> Eric Norum wrote:
>
>> I'm beginning to think that we really ought to just switch over to 
>> the C version which was posted with the original problem report.
>
>
> I believe that in this particular case, fixing the ASM should be OK.
> We have now looked at this in detail and the proposed fixes
> (adding 'm' input and concatenating everyting into a single asm)
> should solve the problem. (And: yes, freebsd has updated versions
> [at least for i386]).
> This code is executed often and problems should surface immediately.
>
Can you commit the fix since you seem to be sure about the correct patch?

> I'm more concerned about other areas involving rare race conditions.
>
Yeah.. those are the nasty ones.

> T.
>
>>
>> On Jul 5, 2006, at 1:38 PM, Joel Sherrill wrote:
>>
>>> Till Straumann wrote:
>>>
>>>> I just found another problem with
>>>> in_cksum_hdr (freebsd mailing list).
>>>>
>>>> Currently, (i386, ppc) the algorithm is split into several
>>>> 'asm' constructs and we assume that the carry bit
>>>> is preserved across these. This is not a valid assumption,
>>>> however.
>>>>
>>> Does their code base have a fixed version?
>>>
>>>> We definitively should try harder to eliminate 'asm'
>>>>
>>> Unfortunately, it is a necessary evel.
>>>
>>>> T.
>>>>
>>>> Joel Sherrill wrote:
>>>>
>>>>> I am going to rely on you guys to make a definitive statement when 
>>>>> you have a
>>>>> fix.  This has been a long thread and I have been away for 10 
>>>>> days.  Eric,
>>>>> please commit the fix you all are happy with.
>>>>>
>>>>> --joel
>>>>>
>>>>> Till Straumann wrote:
>>>>>
>>>>>> Eric Norum wrote:
>>>>>>
>>>>>>>
>>>>>>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>>>>>>
>>>>>>>> I believe I caught this one.
>>>>>>>> Thanks to Danilliu for reporting this and his/(her?)
>>>>>>>> perseverance.
>>>>>>>>
>>>>>>>> It is definitely a toolchain/optimization issue:
>>>>>>>>
>>>>>>>> The BSD code does
>>>>>>>>
>>>>>>>> header->checksum = 0;
>>>>>>>> header->checksum = ip_cksum_hdr(&header);
>>>>>>>>
>>>>>>>> The inline ip_cksum_hdr() routine has inline assembly
>>>>>>>> and gcc doesn't consider that the assembly actually
>>>>>>>> could read 'header->checksum' (which it does) so it thinks the
>>>>>>>> header->checksum = 0 assignment is unnecessary
>>>>>>>> and optimizes it away :-(
>>>>>>>>
>>>>>>>> Confirmed by disassembly of Danilliu's broken
>>>>>>>> and a working binary.
>>>>>>>>
>>>>>>>> [ As soon as any 'real' subroutine (like a printf)
>>>>>>>> is called from ip_cksum_hdr() gcc realizes that
>>>>>>>> the header->checksum=0 assigment could have
>>>>>>>> side-effects and the problem goes away].
>>>>>>>>
>>>>>>>> A memory barrier could resolve this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Here's what the GCC manual has to say on this issue:
>>>>>>>  If your assembler instructions access memory in an unpredictable
>>>>>>> fashion, add `memory' to the list of clobbered registers.  This 
>>>>>>> will
>>>>>>> cause GCC to not keep memory values cached in registers across the
>>>>>>> assembler instruction and not optimize stores or loads to that 
>>>>>>> memory.
>>>>>>> You will also want to add the `volatile' keyword if the memory 
>>>>>>> affected
>>>>>>> is not listed in the inputs or outputs of the `asm', as the 
>>>>>>> `memory'
>>>>>>> clobber does not count as a side-effect of the `asm'.  If you 
>>>>>>> know how
>>>>>>> large the accessed memory is, you can add it as input or output 
>>>>>>> but if
>>>>>>> this is not known, you should add `memory'.  As an example, if you
>>>>>>> access ten bytes of a string, you can use a memory input like:
>>>>>>>
>>>>>>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>>>>>>
>>>>>>> Note that in the following example the memory input is necessary,
>>>>>>> otherwise GCC might optimize the store to `x' away:
>>>>>>>      int foo ()
>>>>>>>      {
>>>>>>>        int x = 42;
>>>>>>>        int *y = &x;
>>>>>>>        int result;
>>>>>>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>>>>>>              "=&d" (r) : "a" (y), "m" (*y));
>>>>>>>        return result;
>>>>>>>      }
>>>>>>> ========================================================================== 
>>>>>>>
>>>>>>>
>>>>>>> The example looks like it describes the in_cksum_hdr() problem.  
>>>>>>> I made the following change:
>>>>>>> diff -u -r1.5 in_cksum.h
>>>>>>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>>>>>>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>>>>>>> @@ -59,10 +59,11 @@
>>>>>>> {
>>>>>>>         register u_int sum = 0;
>>>>>>>                    +
>>>>>>> #define ADD(n) \
>>>>>>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>>> (sum), "r" (ip))
>>>>>>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>>> #define ADDC(n)        \
>>>>>>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>>> (sum), "r" (ip))
>>>>>>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>>> #define MOP    \
>>>>>>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" 
>>>>>>> (sum))
>>>>>>>
>>>>>>>
>>>>>>> To my surprise the compiler emits the same code before and after 
>>>>>>> I made the change. 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I can't confirm this. Using the construct *does* make the 
>>>>>> difference:
>>>>>>
>>>>>> Compiling the snippet
>>>>>>
>>>>>> void test(int *pi)
>>>>>> {
>>>>>>        *pi = 0;
>>>>>> #ifdef USE_MEMB
>>>>>>        asm volatile(""::"m"(*pi));
>>>>>> #endif
>>>>>>        *pi = 3;
>>>>>> }
>>>>>>
>>>>>> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>>>>>>
>>>>>> produces:
>>>>>>
>>>>>>   0:   38 00 00 03     li      r0,3
>>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>>   8:   4e 80 00 20     blr
>>>>>>
>>>>>> whereas
>>>>>>
>>>>>> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump 
>>>>>> -d test.o
>>>>>>
>>>>>> does what we want:
>>>>>>
>>>>>>   0:   38 00 00 00     li      r0,0
>>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>>   8:   38 00 00 03     li      r0,3
>>>>>>   c:   90 03 00 00     stw     r0,0(r3)
>>>>>>  10:   4e 80 00 20     blr
>>>>>>
>>>>>> I also tested i386-rtems-gcc with the same result. Same applies 
>>>>>> to native gcc-3.2.3
>>>>>> I have on my linux box.
>>>>>>
>>>>>> What compiler are you using? What target did you check? Note that 
>>>>>> PPC people
>>>>>> wont' experience PR1110 (look at the ppc implementation of 
>>>>>> in_cksum_hdr(),
>>>>>> the reading of the relevant data is done be interspersed C-code).
>>>>>>
>>>>>> If you are testing the modified <in_cksum.h> beware of outdated 
>>>>>> versions
>>>>>> that might get picked up...
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>> -- T.
>>>>>>
>>>>>>> Am I reading the GCC manual wrong?  Does making this change fix 
>>>>>>> things for you?
>>>>>>> -- 
>>>>>>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>>>>>>> Advanced Photon Source
>>>>>>> Argonne National Laboratory
>>>>>>> (630) 252-4793
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> -- 
>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>> Advanced Photon Source
>> Argonne National Laboratory
>> (630) 252-4793
>>
>>
>




More information about the users mailing list