GNATS-1110: in_cksum_hdr error in PC386 BSP

Till Straumann strauman at slac.stanford.edu
Wed Jul 5 20:09:03 UTC 2006


Eric Norum wrote:

> I'm beginning to think that we really ought to just switch over to the 
> C version which was posted with the original problem report.

I believe that in this particular case, fixing the ASM should be OK.
We have now looked at this in detail and the proposed fixes
(adding 'm' input and concatenating everyting into a single asm)
should solve the problem. (And: yes, freebsd has updated versions
[at least for i386]).
This code is executed often and problems should surface immediately.

I'm more concerned about other areas involving rare race conditions.

T.

>
> On Jul 5, 2006, at 1:38 PM, Joel Sherrill wrote:
>
>> Till Straumann wrote:
>>
>>> I just found another problem with
>>> in_cksum_hdr (freebsd mailing list).
>>>
>>> Currently, (i386, ppc) the algorithm is split into several
>>> 'asm' constructs and we assume that the carry bit
>>> is preserved across these. This is not a valid assumption,
>>> however.
>>>
>> Does their code base have a fixed version?
>>
>>> We definitively should try harder to eliminate 'asm'
>>>
>> Unfortunately, it is a necessary evel.
>>
>>> T.
>>>
>>> Joel Sherrill wrote:
>>>
>>>> I am going to rely on you guys to make a definitive statement when 
>>>> you have a
>>>> fix.  This has been a long thread and I have been away for 10 
>>>> days.  Eric,
>>>> please commit the fix you all are happy with.
>>>>
>>>> --joel
>>>>
>>>> Till Straumann wrote:
>>>>
>>>>> Eric Norum wrote:
>>>>>
>>>>>>
>>>>>> On Jul 1, 2006, at 3:40 AM, Till Straumann wrote:
>>>>>>
>>>>>>> I believe I caught this one.
>>>>>>> Thanks to Danilliu for reporting this and his/(her?)
>>>>>>> perseverance.
>>>>>>>
>>>>>>> It is definitely a toolchain/optimization issue:
>>>>>>>
>>>>>>> The BSD code does
>>>>>>>
>>>>>>> header->checksum = 0;
>>>>>>> header->checksum = ip_cksum_hdr(&header);
>>>>>>>
>>>>>>> The inline ip_cksum_hdr() routine has inline assembly
>>>>>>> and gcc doesn't consider that the assembly actually
>>>>>>> could read 'header->checksum' (which it does) so it thinks the
>>>>>>> header->checksum = 0 assignment is unnecessary
>>>>>>> and optimizes it away :-(
>>>>>>>
>>>>>>> Confirmed by disassembly of Danilliu's broken
>>>>>>> and a working binary.
>>>>>>>
>>>>>>> [ As soon as any 'real' subroutine (like a printf)
>>>>>>> is called from ip_cksum_hdr() gcc realizes that
>>>>>>> the header->checksum=0 assigment could have
>>>>>>> side-effects and the problem goes away].
>>>>>>>
>>>>>>> A memory barrier could resolve this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here's what the GCC manual has to say on this issue:
>>>>>>  If your assembler instructions access memory in an unpredictable
>>>>>> fashion, add `memory' to the list of clobbered registers.  This will
>>>>>> cause GCC to not keep memory values cached in registers across the
>>>>>> assembler instruction and not optimize stores or loads to that 
>>>>>> memory.
>>>>>> You will also want to add the `volatile' keyword if the memory 
>>>>>> affected
>>>>>> is not listed in the inputs or outputs of the `asm', as the `memory'
>>>>>> clobber does not count as a side-effect of the `asm'.  If you 
>>>>>> know how
>>>>>> large the accessed memory is, you can add it as input or output 
>>>>>> but if
>>>>>> this is not known, you should add `memory'.  As an example, if you
>>>>>> access ten bytes of a string, you can use a memory input like:
>>>>>>
>>>>>>      {"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}.
>>>>>>
>>>>>> Note that in the following example the memory input is necessary,
>>>>>> otherwise GCC might optimize the store to `x' away:
>>>>>>      int foo ()
>>>>>>      {
>>>>>>        int x = 42;
>>>>>>        int *y = &x;
>>>>>>        int result;
>>>>>>        asm ("magic stuff accessing an 'int' pointed to by '%1'"
>>>>>>              "=&d" (r) : "a" (y), "m" (*y));
>>>>>>        return result;
>>>>>>      }
>>>>>> ========================================================================== 
>>>>>>
>>>>>>
>>>>>> The example looks like it describes the in_cksum_hdr() problem.  
>>>>>> I made the following change:
>>>>>> diff -u -r1.5 in_cksum.h
>>>>>> --- machine/in_cksum.h  2 Sep 2003 21:31:16 -0000       1.5
>>>>>> +++ machine/in_cksum.h  3 Jul 2006 14:06:25 -0000
>>>>>> @@ -59,10 +59,11 @@
>>>>>> {
>>>>>>         register u_int sum = 0;
>>>>>>                    +
>>>>>> #define ADD(n) \
>>>>>> -    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip))
>>>>>> +    __asm__ volatile ("addl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>> #define ADDC(n)        \
>>>>>> -    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip))
>>>>>> +    __asm__ volatile ("adcl " #n "(%2), %0" : "=r" (sum) : "0" 
>>>>>> (sum), "r" (ip), "m" (*ip))
>>>>>> #define MOP    \
>>>>>>      __asm__ volatile ("adcl         $0, %0" : "=r" (sum) : "0" 
>>>>>> (sum))
>>>>>>
>>>>>>
>>>>>> To my surprise the compiler emits the same code before and after 
>>>>>> I made the change. 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I can't confirm this. Using the construct *does* make the difference:
>>>>>
>>>>> Compiling the snippet
>>>>>
>>>>> void test(int *pi)
>>>>> {
>>>>>        *pi = 0;
>>>>> #ifdef USE_MEMB
>>>>>        asm volatile(""::"m"(*pi));
>>>>> #endif
>>>>>        *pi = 3;
>>>>> }
>>>>>
>>>>> powerpc-rtems-gcc -O -c test.c; powerpc-rtems-objdump -d test.o
>>>>>
>>>>> produces:
>>>>>
>>>>>   0:   38 00 00 03     li      r0,3
>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>   8:   4e 80 00 20     blr
>>>>>
>>>>> whereas
>>>>>
>>>>> powerpc-rtems-gcc -O -c -DUSE_MEMB test.c; powerpc-rtems-objdump 
>>>>> -d test.o
>>>>>
>>>>> does what we want:
>>>>>
>>>>>   0:   38 00 00 00     li      r0,0
>>>>>   4:   90 03 00 00     stw     r0,0(r3)
>>>>>   8:   38 00 00 03     li      r0,3
>>>>>   c:   90 03 00 00     stw     r0,0(r3)
>>>>>  10:   4e 80 00 20     blr
>>>>>
>>>>> I also tested i386-rtems-gcc with the same result. Same applies to 
>>>>> native gcc-3.2.3
>>>>> I have on my linux box.
>>>>>
>>>>> What compiler are you using? What target did you check? Note that 
>>>>> PPC people
>>>>> wont' experience PR1110 (look at the ppc implementation of 
>>>>> in_cksum_hdr(),
>>>>> the reading of the relevant data is done be interspersed C-code).
>>>>>
>>>>> If you are testing the modified <in_cksum.h> beware of outdated 
>>>>> versions
>>>>> that might get picked up...
>>>>>
>>>>>
>>>>> Cheers
>>>>> -- T.
>>>>>
>>>>>> Am I reading the GCC manual wrong?  Does making this change fix 
>>>>>> things for you?
>>>>>> -- 
>>>>>> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
>>>>>> Advanced Photon Source
>>>>>> Argonne National Laboratory
>>>>>> (630) 252-4793
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> -- 
> Eric Norum <norume at aps.anl.gov <mailto:norume at aps.anl.gov>>
> Advanced Photon Source
> Argonne National Laboratory
> (630) 252-4793
>
>




More information about the users mailing list