GNATS-1110: in_cksum_hdr error in PC386 BSP

Till Straumann strauman at slac.stanford.edu
Mon Jul 3 21:54:15 UTC 2006


Eric Norum wrote:

>
> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>
>> I believe I caught this one.
>> Thanks to Danilliu for reporting this and his/(her?)
>> perseverance.
>>
>> It is definitely a toolchain/optimization issue:
>>
>> The BSD code does
>>
>> header->checksum = 0;
>> header->checksum = ip_cksum_hdr(&header);
>>
>> The inline ip_cksum_hdr() routine has inline assembly
>> and gcc doesn't consider that the assembly actually
>> could read 'header->checksum' (which it does) so it thinks the
>> header->checksum = 0 assignment is unnecessary
>> and optimizes it away :-(
>>
>> Confirmed by disassembly of Danilliu's broken
>> and a working binary.
>>
>> [ As soon as any 'real' subroutine (like a printf)
>> is called from ip_cksum_hdr() gcc realizes that
>> the header->checksum=0 assigment could have
>> side-effects and the problem goes away].
>>
>> A memory barrier could resolve this.
>
>
> Here's what the GCC manual has to say on this issue:
>  If your assembler instructions access memory in an unpredictable
> fashion, add `memory' to the list of clobbered registers.  This will
> cause GCC to not keep memory values cached in registers across the
> assembler instruction and not optimize stores or loads to that memory.
> You will also want to add the `volatile' keyword if the memory affected
> is not listed in the inputs or outputs of the `asm', as the `memory'
> clobber does not count as a side-effect of the `asm'.  If you know how
> large the accessed memory is, you can add it as input or output but if
> this is not known, you should add `memory'.  As an example, if you
> access ten bytes of a string, you can use a memory input like:
>
>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>
> Note that in the following example the memory input is necessary,
> otherwise GCC might optimize the store to `x' away:
>      int foo ()
>      {
>        int x = 42;
>        int *y = &x;
>        int result;
>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>              "=&d" (r) : "a" (y), "m" (*y));
>        return result;
>      }
> ==========================================================================
>
>
> The example looks like it describes the in_cksum_hdr() problem.  I 
> made the following change:
> diff -u -r1.5 in_cksum.h
> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
> @@ -59,10 +59,11 @@
> {
>         register u_int sum = 0;
>                    
> +
> #define ADD(n) \
> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" (sum), 
> "r" (ip))
> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" (sum), 
> "r" (ip), "m" (*ip))
> #define ADDC(n)        \
> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" (sum), 
> "r" (ip))
> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" (sum), 
> "r" (ip), "m" (*ip))
> #define MOP    \
>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" (sum))
>
>
> To my surprise the compiler emits the same code before and after I 
> made the change. 

I can't confirm this. Using the construct *does* make the difference:

Compiling the snippet

void test(int *pi)
{
        *pi = 0;
#ifdef USE_MEMB
        asm volatile(""::"m"(*pi));
#endif
        *pi = 3;
}

powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o

produces:

   0:   38 00 00 03     li      r0,3
   4:   90 03 00 00     stw     r0,0(r3)
   8:   4e 80 00 20     blr

whereas

powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump -d test.o

does what we want:

   0:   38 00 00 00     li      r0,0
   4:   90 03 00 00     stw     r0,0(r3)
   8:   38 00 00 03     li      r0,3
   c:   90 03 00 00     stw     r0,0(r3)
  10:   4e 80 00 20     blr

I also tested i386-rtems-gcc with the same result. Same applies to 
native gcc-3.2.3
I have on my linux box.

What compiler are you using? What target did you check? Note that PPC people
wont' experience PR1110 (look at the ppc implementation of in_cksum_hdr(),
the reading of the relevant data is done be interspersed C-code).

If you are testing the modified <in_cksum.h> beware of outdated versions
that might get picked up...


Cheers
-- T.

> Am I reading the GCC manual wrong?  Does making this change fix things 
> for you?
> -- 
> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
> Advanced Photon Source
> Argonne National Laboratory
> (630) 252-4793
>
>




More information about the users mailing list