GNATS-1110: in_cksum_hdr error in PC386 BSP

Joel Sherrill joel.sherrill at oarcorp.com
Wed Jul 5 18:38:20 UTC 2006


Till Straumann wrote:

> I just found another problem with
> in_cksum_hdr (freebsd mailing list).
>
> Currently, (i386, ppc) the algorithm is split into several
> 'asm' constructs and we assume that the carry bit
> is preserved across these. This is not a valid assumption,
> however.
>
Does their code base have a fixed version?

> We definitively should try harder to eliminate 'asm'
>
Unfortunately, it is a necessary evel.

> T.
>
> Joel Sherrill wrote:
>
>> I am going to rely on you guys to make a definitive statement when 
>> you have a
>> fix.  This has been a long thread and I have been away for 10 days.  
>> Eric,
>> please commit the fix you all are happy with.
>>
>> --joel
>>
>> Till Straumann wrote:
>>
>>> Eric Norum wrote:
>>>
>>>>
>>>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>>>
>>>>> I believe I caught this one.
>>>>> Thanks to Danilliu for reporting this and his/(her?)
>>>>> perseverance.
>>>>>
>>>>> It is definitely a toolchain/optimization issue:
>>>>>
>>>>> The BSD code does
>>>>>
>>>>> header->checksum = 0;
>>>>> header->checksum = ip_cksum_hdr(&header);
>>>>>
>>>>> The inline ip_cksum_hdr() routine has inline assembly
>>>>> and gcc doesn't consider that the assembly actually
>>>>> could read 'header->checksum' (which it does) so it thinks the
>>>>> header->checksum = 0 assignment is unnecessary
>>>>> and optimizes it away :-(
>>>>>
>>>>> Confirmed by disassembly of Danilliu's broken
>>>>> and a working binary.
>>>>>
>>>>> [ As soon as any 'real' subroutine (like a printf)
>>>>> is called from ip_cksum_hdr() gcc realizes that
>>>>> the header->checksum=0 assigment could have
>>>>> side-effects and the problem goes away].
>>>>>
>>>>> A memory barrier could resolve this.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Here's what the GCC manual has to say on this issue:
>>>>  If your assembler instructions access memory in an unpredictable
>>>> fashion, add `memory' to the list of clobbered registers.  This will
>>>> cause GCC to not keep memory values cached in registers across the
>>>> assembler instruction and not optimize stores or loads to that memory.
>>>> You will also want to add the `volatile' keyword if the memory 
>>>> affected
>>>> is not listed in the inputs or outputs of the `asm', as the `memory'
>>>> clobber does not count as a side-effect of the `asm'.  If you know how
>>>> large the accessed memory is, you can add it as input or output but if
>>>> this is not known, you should add `memory'.  As an example, if you
>>>> access ten bytes of a string, you can use a memory input like:
>>>>
>>>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>>>
>>>> Note that in the following example the memory input is necessary,
>>>> otherwise GCC might optimize the store to `x' away:
>>>>      int foo ()
>>>>      {
>>>>        int x = 42;
>>>>        int *y = &x;
>>>>        int result;
>>>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>>>              "=&d" (r) : "a" (y), "m" (*y));
>>>>        return result;
>>>>      }
>>>> ========================================================================== 
>>>>
>>>>
>>>>
>>>> The example looks like it describes the in_cksum_hdr() problem.  I 
>>>> made the following change:
>>>> diff -u -r1.5 in_cksum.h
>>>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>>>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>>>> @@ -59,10 +59,11 @@
>>>> {
>>>>         register u_int sum = 0;
>>>>                    +
>>>> #define ADD(n) \
>>>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>> (sum), "r" (ip))
>>>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>> (sum), "r" (ip), "m" (*ip))
>>>> #define ADDC(n)        \
>>>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>> (sum), "r" (ip))
>>>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>> (sum), "r" (ip), "m" (*ip))
>>>> #define MOP    \
>>>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" (sum))
>>>>
>>>>
>>>> To my surprise the compiler emits the same code before and after I 
>>>> made the change. 
>>>
>>>
>>>
>>>
>>> I can't confirm this. Using the construct *does* make the difference:
>>>
>>> Compiling the snippet
>>>
>>> void test(int *pi)
>>> {
>>>        *pi = 0;
>>> #ifdef USE_MEMB
>>>        asm volatile(""::"m"(*pi));
>>> #endif
>>>        *pi = 3;
>>> }
>>>
>>> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>>>
>>> produces:
>>>
>>>   0:   38 00 00 03     li      r0,3
>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>   8:   4e 80 00 20     blr
>>>
>>> whereas
>>>
>>> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump -d 
>>> test.o
>>>
>>> does what we want:
>>>
>>>   0:   38 00 00 00     li      r0,0
>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>   8:   38 00 00 03     li      r0,3
>>>   c:   90 03 00 00     stw     r0,0(r3)
>>>  10:   4e 80 00 20     blr
>>>
>>> I also tested i386-rtems-gcc with the same result. Same applies to 
>>> native gcc-3.2.3
>>> I have on my linux box.
>>>
>>> What compiler are you using? What target did you check? Note that 
>>> PPC people
>>> wont' experience PR1110 (look at the ppc implementation of 
>>> in_cksum_hdr(),
>>> the reading of the relevant data is done be interspersed C-code).
>>>
>>> If you are testing the modified <in_cksum.h> beware of outdated 
>>> versions
>>> that might get picked up...
>>>
>>>
>>> Cheers
>>> -- T.
>>>
>>>> Am I reading the GCC manual wrong?  Does making this change fix 
>>>> things for you?
>>>> -- 
>>>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>>>> Advanced Photon Source
>>>> Argonne National Laboratory
>>>> (630) 252-4793
>>>>
>>>>
>>>
>>
>




More information about the users mailing list