GNATS-1110: in_cksum_hdr error in PC386 BSP

Till Straumann strauman at slac.stanford.edu
Wed Jul 5 18:35:15 UTC 2006


I just found another problem with
in_cksum_hdr (freebsd mailing list).

Currently, (i386, ppc) the algorithm is split into several
'asm' constructs and we assume that the carry bit
is preserved across these. This is not a valid assumption,
however.

We definitively should try harder to eliminate 'asm'

T.

Joel Sherrill wrote:

> I am going to rely on you guys to make a definitive statement when you 
> have a
> fix.  This has been a long thread and I have been away for 10 days.  
> Eric,
> please commit the fix you all are happy with.
>
> --joel
>
> Till Straumann wrote:
>
>> Eric Norum wrote:
>>
>>>
>>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>>
>>>> I believe I caught this one.
>>>> Thanks to Danilliu for reporting this and his/(her?)
>>>> perseverance.
>>>>
>>>> It is definitely a toolchain/optimization issue:
>>>>
>>>> The BSD code does
>>>>
>>>> header->checksum = 0;
>>>> header->checksum = ip_cksum_hdr(&header);
>>>>
>>>> The inline ip_cksum_hdr() routine has inline assembly
>>>> and gcc doesn't consider that the assembly actually
>>>> could read 'header->checksum' (which it does) so it thinks the
>>>> header->checksum = 0 assignment is unnecessary
>>>> and optimizes it away :-(
>>>>
>>>> Confirmed by disassembly of Danilliu's broken
>>>> and a working binary.
>>>>
>>>> [ As soon as any 'real' subroutine (like a printf)
>>>> is called from ip_cksum_hdr() gcc realizes that
>>>> the header->checksum=0 assigment could have
>>>> side-effects and the problem goes away].
>>>>
>>>> A memory barrier could resolve this.
>>>
>>>
>>>
>>>
>>> Here's what the GCC manual has to say on this issue:
>>>  If your assembler instructions access memory in an unpredictable
>>> fashion, add `memory' to the list of clobbered registers.  This will
>>> cause GCC to not keep memory values cached in registers across the
>>> assembler instruction and not optimize stores or loads to that memory.
>>> You will also want to add the `volatile' keyword if the memory affected
>>> is not listed in the inputs or outputs of the `asm', as the `memory'
>>> clobber does not count as a side-effect of the `asm'.  If you know how
>>> large the accessed memory is, you can add it as input or output but if
>>> this is not known, you should add `memory'.  As an example, if you
>>> access ten bytes of a string, you can use a memory input like:
>>>
>>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>>
>>> Note that in the following example the memory input is necessary,
>>> otherwise GCC might optimize the store to `x' away:
>>>      int foo ()
>>>      {
>>>        int x = 42;
>>>        int *y = &x;
>>>        int result;
>>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>>              "=&d" (r) : "a" (y), "m" (*y));
>>>        return result;
>>>      }
>>> ========================================================================== 
>>>
>>>
>>>
>>> The example looks like it describes the in_cksum_hdr() problem.  I 
>>> made the following change:
>>> diff -u -r1.5 in_cksum.h
>>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>>> @@ -59,10 +59,11 @@
>>> {
>>>         register u_int sum = 0;
>>>                    +
>>> #define ADD(n) \
>>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>> (sum), "r" (ip))
>>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>> (sum), "r" (ip), "m" (*ip))
>>> #define ADDC(n)        \
>>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>> (sum), "r" (ip))
>>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>> (sum), "r" (ip), "m" (*ip))
>>> #define MOP    \
>>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" (sum))
>>>
>>>
>>> To my surprise the compiler emits the same code before and after I 
>>> made the change. 
>>
>>
>>
>> I can't confirm this. Using the construct *does* make the difference:
>>
>> Compiling the snippet
>>
>> void test(int *pi)
>> {
>>        *pi = 0;
>> #ifdef USE_MEMB
>>        asm volatile(""::"m"(*pi));
>> #endif
>>        *pi = 3;
>> }
>>
>> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>>
>> produces:
>>
>>   0:   38 00 00 03     li      r0,3
>>   4:   90 03 00 00     stw     r0,0(r3)
>>   8:   4e 80 00 20     blr
>>
>> whereas
>>
>> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump -d 
>> test.o
>>
>> does what we want:
>>
>>   0:   38 00 00 00     li      r0,0
>>   4:   90 03 00 00     stw     r0,0(r3)
>>   8:   38 00 00 03     li      r0,3
>>   c:   90 03 00 00     stw     r0,0(r3)
>>  10:   4e 80 00 20     blr
>>
>> I also tested i386-rtems-gcc with the same result. Same applies to 
>> native gcc-3.2.3
>> I have on my linux box.
>>
>> What compiler are you using? What target did you check? Note that PPC 
>> people
>> wont' experience PR1110 (look at the ppc implementation of 
>> in_cksum_hdr(),
>> the reading of the relevant data is done be interspersed C-code).
>>
>> If you are testing the modified <in_cksum.h> beware of outdated versions
>> that might get picked up...
>>
>>
>> Cheers
>> -- T.
>>
>>> Am I reading the GCC manual wrong?  Does making this change fix 
>>> things for you?
>>> -- 
>>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>>> Advanced Photon Source
>>> Argonne National Laboratory
>>> (630) 252-4793
>>>
>>>
>>
>




More information about the users mailing list