GNATS-1110: in_cksum_hdr error in PC386 BSP

Joel Sherrill joel.sherrill at oarcorp.com
Tue Jul 4 14:52:34 UTC 2006


I am going to rely on you guys to make a definitive statement when you 
have a
fix.  This has been a long thread and I have been away for 10 days.  Eric,
please commit the fix you all are happy with.

--joel

Till Straumann wrote:

> Eric Norum wrote:
>
>>
>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>
>>> I believe I caught this one.
>>> Thanks to Danilliu for reporting this and his/(her?)
>>> perseverance.
>>>
>>> It is definitely a toolchain/optimization issue:
>>>
>>> The BSD code does
>>>
>>> header->checksum = 0;
>>> header->checksum = ip_cksum_hdr(&header);
>>>
>>> The inline ip_cksum_hdr() routine has inline assembly
>>> and gcc doesn't consider that the assembly actually
>>> could read 'header->checksum' (which it does) so it thinks the
>>> header->checksum = 0 assignment is unnecessary
>>> and optimizes it away :-(
>>>
>>> Confirmed by disassembly of Danilliu's broken
>>> and a working binary.
>>>
>>> [ As soon as any 'real' subroutine (like a printf)
>>> is called from ip_cksum_hdr() gcc realizes that
>>> the header->checksum=0 assigment could have
>>> side-effects and the problem goes away].
>>>
>>> A memory barrier could resolve this.
>>
>>
>>
>> Here's what the GCC manual has to say on this issue:
>>  If your assembler instructions access memory in an unpredictable
>> fashion, add `memory' to the list of clobbered registers.  This will
>> cause GCC to not keep memory values cached in registers across the
>> assembler instruction and not optimize stores or loads to that memory.
>> You will also want to add the `volatile' keyword if the memory affected
>> is not listed in the inputs or outputs of the `asm', as the `memory'
>> clobber does not count as a side-effect of the `asm'.  If you know how
>> large the accessed memory is, you can add it as input or output but if
>> this is not known, you should add `memory'.  As an example, if you
>> access ten bytes of a string, you can use a memory input like:
>>
>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>
>> Note that in the following example the memory input is necessary,
>> otherwise GCC might optimize the store to `x' away:
>>      int foo ()
>>      {
>>        int x = 42;
>>        int *y = &x;
>>        int result;
>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>              "=&d" (r) : "a" (y), "m" (*y));
>>        return result;
>>      }
>> ========================================================================== 
>>
>>
>>
>> The example looks like it describes the in_cksum_hdr() problem.  I 
>> made the following change:
>> diff -u -r1.5 in_cksum.h
>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>> @@ -59,10 +59,11 @@
>> {
>>         register u_int sum = 0;
>>                    +
>> #define ADD(n) \
>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>> (sum), "r" (ip))
>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>> (sum), "r" (ip), "m" (*ip))
>> #define ADDC(n)        \
>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>> (sum), "r" (ip))
>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>> (sum), "r" (ip), "m" (*ip))
>> #define MOP    \
>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" (sum))
>>
>>
>> To my surprise the compiler emits the same code before and after I 
>> made the change. 
>
>
> I can't confirm this. Using the construct *does* make the difference:
>
> Compiling the snippet
>
> void test(int *pi)
> {
>        *pi = 0;
> #ifdef USE_MEMB
>        asm volatile(""::"m"(*pi));
> #endif
>        *pi = 3;
> }
>
> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>
> produces:
>
>   0:   38 00 00 03     li      r0,3
>   4:   90 03 00 00     stw     r0,0(r3)
>   8:   4e 80 00 20     blr
>
> whereas
>
> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump -d 
> test.o
>
> does what we want:
>
>   0:   38 00 00 00     li      r0,0
>   4:   90 03 00 00     stw     r0,0(r3)
>   8:   38 00 00 03     li      r0,3
>   c:   90 03 00 00     stw     r0,0(r3)
>  10:   4e 80 00 20     blr
>
> I also tested i386-rtems-gcc with the same result. Same applies to 
> native gcc-3.2.3
> I have on my linux box.
>
> What compiler are you using? What target did you check? Note that PPC 
> people
> wont' experience PR1110 (look at the ppc implementation of 
> in_cksum_hdr(),
> the reading of the relevant data is done be interspersed C-code).
>
> If you are testing the modified <in_cksum.h> beware of outdated versions
> that might get picked up...
>
>
> Cheers
> -- T.
>
>> Am I reading the GCC manual wrong?  Does making this change fix 
>> things for you?
>> -- 
>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>> Advanced Photon Source
>> Argonne National Laboratory
>> (630) 252-4793
>>
>>
>




More information about the users mailing list