GNATS-1110: in_cksum_hdr error in PC386 BSP

Joel Sherrill joel.sherrill at oarcorp.com
Wed Jul 5 19:27:19 UTC 2006


Eric Norum wrote:

> I'm beginning to think that we really ought to just switch over to the 
> C version which was posted with the original problem report.

How would this differ from disabling the CPU specific version for the 
i386 and using the
generic C version?

And ..I know you worry about this.. how does this impact performance?

--joel

>
> On Jul 5, 2006, at 1:38 PM, Joel Sherrill wrote:
>
>> Till Straumann wrote:
>>
>>> I just found another problem with
>>> in_cksum_hdr (freebsd mailing list).
>>>
>>> Currently, (i386, ppc) the algorithm is split into several
>>> 'asm' constructs and we assume that the carry bit
>>> is preserved across these. This is not a valid assumption,
>>> however.
>>>
>> Does their code base have a fixed version?
>>
>>> We definitively should try harder to eliminate 'asm'
>>>
>> Unfortunately, it is a necessary evel.
>>
>>> T.
>>>
>>> Joel Sherrill wrote:
>>>
>>>> I am going to rely on you guys to make a definitive statement when 
>>>> you have a
>>>> fix.  This has been a long thread and I have been away for 10 
>>>> days.  Eric,
>>>> please commit the fix you all are happy with.
>>>>
>>>> --joel
>>>>
>>>> Till Straumann wrote:
>>>>
>>>>> Eric Norum wrote:
>>>>>
>>>>>>
>>>>>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>>>>>
>>>>>>> I believe I caught this one.
>>>>>>> Thanks to Danilliu for reporting this and his/(her?)
>>>>>>> perseverance.
>>>>>>>
>>>>>>> It is definitely a toolchain/optimization issue:
>>>>>>>
>>>>>>> The BSD code does
>>>>>>>
>>>>>>> header->checksum = 0;
>>>>>>> header->checksum = ip_cksum_hdr(&header);
>>>>>>>
>>>>>>> The inline ip_cksum_hdr() routine has inline assembly
>>>>>>> and gcc doesn't consider that the assembly actually
>>>>>>> could read 'header->checksum' (which it does) so it thinks the
>>>>>>> header->checksum = 0 assignment is unnecessary
>>>>>>> and optimizes it away :-(
>>>>>>>
>>>>>>> Confirmed by disassembly of Danilliu's broken
>>>>>>> and a working binary.
>>>>>>>
>>>>>>> [ As soon as any 'real' subroutine (like a printf)
>>>>>>> is called from ip_cksum_hdr() gcc realizes that
>>>>>>> the header->checksum=0 assigment could have
>>>>>>> side-effects and the problem goes away].
>>>>>>>
>>>>>>> A memory barrier could resolve this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here's what the GCC manual has to say on this issue:
>>>>>>  If your assembler instructions access memory in an unpredictable
>>>>>> fashion, add `memory' to the list of clobbered registers.  This will
>>>>>> cause GCC to not keep memory values cached in registers across the
>>>>>> assembler instruction and not optimize stores or loads to that 
>>>>>> memory.
>>>>>> You will also want to add the `volatile' keyword if the memory 
>>>>>> affected
>>>>>> is not listed in the inputs or outputs of the `asm', as the `memory'
>>>>>> clobber does not count as a side-effect of the `asm'.  If you 
>>>>>> know how
>>>>>> large the accessed memory is, you can add it as input or output 
>>>>>> but if
>>>>>> this is not known, you should add `memory'.  As an example, if you
>>>>>> access ten bytes of a string, you can use a memory input like:
>>>>>>
>>>>>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>>>>>
>>>>>> Note that in the following example the memory input is necessary,
>>>>>> otherwise GCC might optimize the store to `x' away:
>>>>>>      int foo ()
>>>>>>      {
>>>>>>        int x = 42;
>>>>>>        int *y = &x;
>>>>>>        int result;
>>>>>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>>>>>              "=&d" (r) : "a" (y), "m" (*y));
>>>>>>        return result;
>>>>>>      }
>>>>>> ========================================================================== 
>>>>>>
>>>>>>
>>>>>> The example looks like it describes the in_cksum_hdr() problem.  
>>>>>> I made the following change:
>>>>>> diff -u -r1.5 in_cksum.h
>>>>>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>>>>>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>>>>>> @@ -59,10 +59,11 @@
>>>>>> {
>>>>>>         register u_int sum = 0;
>>>>>>                    +
>>>>>> #define ADD(n) \
>>>>>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip))
>>>>>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>> #define ADDC(n)        \
>>>>>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip))
>>>>>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>> #define MOP    \
>>>>>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" 
>>>>>> (sum))
>>>>>>
>>>>>>
>>>>>> To my surprise the compiler emits the same code before and after 
>>>>>> I made the change. 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I can't confirm this. Using the construct *does* make the difference:
>>>>>
>>>>> Compiling the snippet
>>>>>
>>>>> void test(int *pi)
>>>>> {
>>>>>        *pi = 0;
>>>>> #ifdef USE_MEMB
>>>>>        asm volatile(""::"m"(*pi));
>>>>> #endif
>>>>>        *pi = 3;
>>>>> }
>>>>>
>>>>> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>>>>>
>>>>> produces:
>>>>>
>>>>>   0:   38 00 00 03     li      r0,3
>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>   8:   4e 80 00 20     blr
>>>>>
>>>>> whereas
>>>>>
>>>>> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump 
>>>>> -d test.o
>>>>>
>>>>> does what we want:
>>>>>
>>>>>   0:   38 00 00 00     li      r0,0
>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>   8:   38 00 00 03     li      r0,3
>>>>>   c:   90 03 00 00     stw     r0,0(r3)
>>>>>  10:   4e 80 00 20     blr
>>>>>
>>>>> I also tested i386-rtems-gcc with the same result. Same applies to 
>>>>> native gcc-3.2.3
>>>>> I have on my linux box.
>>>>>
>>>>> What compiler are you using? What target did you check? Note that 
>>>>> PPC people
>>>>> wont' experience PR1110 (look at the ppc implementation of 
>>>>> in_cksum_hdr(),
>>>>> the reading of the relevant data is done be interspersed C-code).
>>>>>
>>>>> If you are testing the modified <in_cksum.h> beware of outdated 
>>>>> versions
>>>>> that might get picked up...
>>>>>
>>>>>
>>>>> Cheers
>>>>> -- T.
>>>>>
>>>>>> Am I reading the GCC manual wrong?  Does making this change fix 
>>>>>> things for you?
>>>>>> -- 
>>>>>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>>>>>> Advanced Photon Source
>>>>>> Argonne National Laboratory
>>>>>> (630) 252-4793
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> -- 
> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
> Advanced Photon Source
> Argonne National Laboratory
> (630) 252-4793
>
>




More information about the users mailing list