gcc  4.3.2 vectorizes access to volatile array
    Peters, Kenneth J 
    kenneth.j.peters at jpl.nasa.gov
       
    Mon Jun 22 16:15:35 UTC 2009
    
    
  
Seems like a bug, and see this paper for some scary research on compiler volatile handling in general:
http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
Ken
> ABSTRACT
> C's volatile qualifier is intended to provide a reliable link between
> operations at the source-code level and operations at the memorysystem
> level. We tested thirteen production-quality C compilers
> and, for each, found situations in which the compiler generated
> incorrect code for accessing volatile variables. This result is disturbing
> because it implies that embedded software and operating
> systems - both typically coded in C, both being bases for many
> mission-critical and safety-critical applications, and both relying
> on the correct translation of volatiles - may be being miscompiled.
On 6/22/09 7:47 AM, "Till Straumann" <strauman at slac.stanford.edu> wrote:
gcc-4.3.2 seems to produce bad code when
accessing an array of small 'volatile'
objects -- it may try to access multiple
such objects in a 'parallel' fashion.
E.g., instead of reading two consecutive
'volatile short's sequentially it reads
a single 32-bit longword. This may crash
e.g., when accessing a memory-mapped device
which allows only 16-bit accesses.
If I compile this code fragment
void volarrcpy(short *d, volatile short *s, int n)
{
int i;
  for (i=0; i<n; i++)
    d[i] = s[i];
}
with '-O3' (the critical option seems to be '-ftree-vectorize')
then gcc-4.3.2 produces quite complicated code
but the essential section is (powerpc)
.L7:
    lhz 0,0(11)
    addi 11,11,2
    lwzx 0,4,9
    stwx 0,3,9
    addi 9,9,4
    bdnz .L7
or i386
.L7:
    movw    (%ecx), %ax
    movl    (%esi,%edx,4), %eax
    movl    %eax, (%ebx,%edx,4)
    incl    %edx
    addl    $2, %ecx
    cmpl    %edx, -20(%ebp)
    ja  .L7
Disassembled back into C-code, this reads
uint32_t *dst_l = (uint32_t*)d;
uint32_t *src_l = (uint32_t*)s;
for (i=0; i<n/2; i++) {
    d[i]     = s[i];
    dst_l[i] = src_l[i];
}
This code seems neither optimal nor correct.
Besides reading half of the locations twice
which violates the semantics of volatile
objects accessing such objects in a 'vectorized'
way (in this case: instead of reading
two adjacent short addresses gcc emits
a single 32-bit read) seems illegal to me.
Similar behavior seems to be present in 4.3.3.
Does anybody have some insight? Should I file
a bug report?
Regards
-- Till
PS: I'm not subscribed to the gcc mailing list;
please CC me on any replies, thanks.
_______________________________________________
rtems-users mailing list
rtems-users at rtems.org
http://www.rtems.org/mailman/listinfo/rtems-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20090622/e94c23f1/attachment-0001.html>
    
    
More information about the users
mailing list