RFH: aliasing problems

Steven Johnson sjohnson at sakuraindustries.com
Tue Dec 12 04:04:18 UTC 2006


Sergei Organov wrote:
> Steven Johnson
> <sjohnson at sakuraindustries.com> writes:
>   
>> Hi,
>>
>> Can anyone explain how -fstrict-aliasing is compatible with C99 Section
>> 6.3.2.3(7).
>>
>> because it says:
>>
>> "A pointer to an object ... may be converted to a pointer to a different
>> object ... when converted back again the results shall compare equal to
>> the original pointer."
>>
>> If this is not the definition of type punning with pointers, I do not
>> know what is.
>>     
>
> This tells nothing about semantics of accessing corresponding object
> by dereferencing the pointer, while aliasing is all about accessing actual
> objects using pointers of different types.
>   
The full quotation is:

*6.3.2.3(7):
"A pointer to an object or incomplete type may be converted to a pointer 
to a different object or incomplete type."
*
/so the following is valid, according to this statement:
void f(void)
{
  int *ip;
  float *fp;
  int i = 12345;

  ip = &i;
  fp = (float*)ip;
/}

*"If the resulting pointer is not correctly aligned () for the 
pointed-to type, the behavior is undefined. (In general, the concept 
''correctly aligned'' is transitive: if a pointer to type A is correctly 
aligned for a pointer to type B, which in turn is correctly aligned for 
a pointer to type C, then a pointer to type A is correctly aligned for a 
pointer to type C.)"
*
/So, this obviously contemplates accessing the data, otherwise alignment 
is not an issue.
/
*"Otherwise, when converted back again, the result shall compare equal 
to the original pointer."
*/
Whenever the pointer is converted back to it's original type, the 
address of the pointer must be the same as before conversion.  This can 
not be performed if the pointer is changed, because no information 
travels with the pointer telling what it's original value was or 
original type.  So the data of the object pointed to MUST be equal for 
the original pointer and the converted pointer.  if accessing the 
converted pointer is restricted then the whole paragraph is a nonsense 
and redundant.  What does it specify if you are forbidden from accessing 
the contents of the converted pointer?  What is its purpose?
/
*"When a pointer to an object is converted to a pointer to a character 
type, the result points to the lowest addressed byte of the object. 
Successive increments of the result, up to the size of the object, yield 
pointers to the remaining bytes of the object."

*Again certainly contemplates accessing the type cast pointer and 
describes what happens with incrementing it, for the purpose of 
accessing (using char pointers).

The C99 rationale on page 49 says:
*   Consequences of the treatment of pointer types in the Standard include:
      . A pointer to void may be converted to a pointer to an object of 
any type.
      . A pointer to any object of any type may be converted to a 
pointer to void.
      . If a pointer to an object is converted to a pointer to void and 
back again to the original
       pointer type, the result compares equal to original pointer.
      . It is invalid to convert a pointer to an object of any type to a 
pointer to an object of a
         different type without an explicit cast.
      . Even with an explicit cast, it is invalid to convert a function 
pointer to an object pointer
         or a pointer to void, or vice versa.
    . It is invalid to convert a pointer to a function of one type to a 
pointer to a function of a
         different type without a cast.
      . Pointers to functions that have different parameter-type 
information (including the "old-
         style" absence of parameter-type information) are different types.
*
Note:
*      . It is invalid to convert a pointer to an object of any type to 
a pointer to an object of a
         different type without an explicit cast.
*The corollary of this is "It is valid to convert a pointer to an object 
of any type to a pointer to an object of a different type with an 
explicit cast".  Otherwise this statement would be, "it is invalid to 
convert a pointer to an object of any type to a pointer of a different 
type FULLSTOP".

Again, if you can't use the converted pointer there is no point in 
converting it, so I say this says everything about type punning using 
pointers, and it is allowed by C99.
**
The Rationale says the spirit of the specification of the C language is:

*   Keep the spirit of C. The C89 Committee kept as a major goal to 
preserve the traditional spirit
   of C. There are many facets of the spirit of C, but the essence is a 
community sentiment of the
 underlying principles upon which the C language is based. Some of the 
facets of the spirit of C
   can be summarized in phrases like:
       . Trust the programmer.
       . Don't prevent the programmer from doing what needs to be done.
       . Keep the language small and simple.
     . Provide only one way to do an operation.
       . Make it fast, even if it is not guaranteed to be portable.
*
Strict aliasing is against this spirit.  No where does the Rationale 
provide a rationale for such a radical and spirit changing alteration to 
C, as strict aliasing performs.

The paragraph everyone is hung up on is:*
An object shall have its stored value accessed only by an lvalue 
expression that has one of
the following types: (The intent of this list is to specify those 
circumstances in which an object may or may not be aliased.)
--- a type compatible with the effective type of the object,
--- a qualified version of a type compatible with the effective type of 
the object,
--- a type that is the signed or unsigned type corresponding to the 
effective type of the
    object,
--- a type that is the signed or unsigned type corresponding to a 
qualified version of the
    effective type of the object,
--- an aggregate or union type that includes one of the aforementioned 
types among its
    members (including, recursively, a member of a subaggregate or 
contained union), or
--- a character type.
*
Effective Type is the "Effective type of an object which can be changed 
by the lvalue expression" this is supported by the C99 rationale, page 11:

*  The definition of object does not employ the notion of type. Thus an 
object has no type in and of
  itself. However, since an object may only be designated by an lvalue 
(see §6.3.2.1), the phrase
  "the type of an object" is taken to mean, here and in the Standard, 
"the type of the lvalue
  designating this object," and "the value of an object" means "the 
contents of the object
  interpreted as a value of the type of the lvalue designating the object."
*
an lvalue is (6.3.2.1):

*  An lvalue is an expression with an object type or an incomplete type 
other than void; ...
  When an object is said to have a particular type, the type is 
specified by the lvalue used to
  designate the object.
*
and the rationale in page 48:

*   A difference of opinion within the C community centered around the 
meaning of lvalue, one
   group considering an lvalue to be any kind of object locator, another 
group holding that an lvalue
 is meaningful on the left side of an assigning operator. The C89 
Committee adopted the
   definition of lvalue as an object locator. The term modifiable lvalue 
is used for the second of the
   above concepts.
*
and casting is an expression according to the formal language specification.

*lvalues* are *expressions,
expressions *encompass *casting
*the language specification clearly says any object can be cast to any 
other object.

Breaking down the problematic statement from the standard one gets 
(after properly evaluating all of the word in it):
*An object shall have its stored value accessed only by an lvalue 
expression that has one of
the following types: (The intent of this list is to specify those 
circumstances in which an object may or may not be aliased.)
*An access can be read or write,

*--- a type compatible with the effective type of the object,*
on read, the value being read into must be compatible with the lvalue 
expression (including any casting) of the pointer being referenced.
on write, the value being written into the object must be compatible 
with the lvalue expression (including any casting) of the pointer being 
referenced.

and the others don't really apply to this discussion after all.  I can 
not find any definitive reference that says if you want to type pun you 
must use unions.  it just isn't there.  the spec doesn't say what 
everyone is saying it says, at all, and in fact it says a whole lot 
about the exact opposite. 

Page 59/60 of the C99 Rationale says:

*In practice, aliasing arises with the use of pointers. A contrived 
example to illustrate the issues is
           int a;
           void f(int * b)
           {
                a = 1;
              *b = 2;
                g(a);
           }
   It is tempting to generate the call to g as if the source expression 
were g(1), but b might point
   to a, so this optimization is not safe.
*
/So the compiler can not sub expression eliminate the reference to a, 
assuming it is a constant to 1, it must get its value for the call to g, 
as it might be 2./

*On the other hand, consider
         int a;
           void f( double * b )
           {
                a = 1;
                *b = 2.0;
              g(a);
           }
   Again the optimization is incorrect only if b points to a. However, 
this would only have come
   about if the address of a were somewhere cast to double*. The C89 
Committee has decided
   that such dubious possibilities need not be allowed for.

*/So, as an optimisation the compiler can not determine if double *b 
points to a, so it can assume it doesn't and call g(a) with the constant 
1, rather than look up the value./

* In principle, then, aliasing only need be allowed for when the lvalues 
all have the same type.
*
/But, following the rule, the following should be OK, and should be 
analogous to the first example:
         int a;
           void f( double * b )
           {
                int *c;
                a = 1;
                *c = (int*)b;
                *c = 2;
              g(a);
           }
Because the "effective type" of c is an int, for the assignment of 2, so 
the compiler has the same problem as the first example, it can not 
assume that c and hence b does not point to an int, because the lvalue 
of the expression in the assignment was an int, and the spec says if you 
convert a pointer back to its original type it will have the original 
address, so the optimiser can not assume that it isn't pointing to a, in 
this case.

/If this is not the case, then again, type casting pointers has no 
utility, so why is it still in the standard.

Further, if Type Aliasing as implemented by GCC is part of C99, then GCC 
is broken (in C99 mode) without optimisations on, because it does not 
enforce the rule. Take the attached code, it behaves differently with 
the following command line to build:

built with:
gcc -std=c99 -pedantic -O -fstrict-aliasing -Wstrict-aliasing=2 test.c

when run gives:
5
5
13
15
5

built with:
gcc -std=c99 -pedantic -fstrict-aliasing -Wstrict-aliasing=2 test.c

when run gives:
5
5
5
5
5

Neither of which generate ANY warnings or Errors in build.

The fact that you get different behaviour with optimisations ON and OFF, 
and no warning that there is a problem, to me indicates a broken 
optimiser.  No one is going to convince me that when a program has 
radically different behaviour with the optimiser ON or OFF that the 
Optimiser isn't broken (or the unoptimised compiler isn't broken, take 
your pick).  Also whether the code is inlined or not changing the 
behaviour is very concerning, especially as at higher optimisation 
levels GCC automatically inlines code. Further, my tests prove to me 
that the presence or absence of the warning means precisely nothing.   
The warning can be present, yet the code behave precisely as intended, 
the warning not be present, and the code be broken as in this case.

I don't read the C99 specification to give the strict interpretation 
that GCC spins on it, and in any event, I don't see how the C99 
specification makes it permissible for inline code to behave differently 
than non-inline code (with the same operation), or for a compiler to 
have completely different behaviour with optimisation ON or OFF.

Further, given the HUGE amount of non C99 code for lots of other reasons 
(such as the K&R parameter declarations all through the network stack).  
I don't see why there is such a hangup on disabling this highly 
problematic and suspicious optimisation.  Why file a PR on this, and 
ignore the other obvious non C99 conformances in the same immediate 
vicinity of the file?

Steven J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20061212/b4c97dae/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.c
Type: text/x-csrc
Size: 766 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20061212/b4c97dae/attachment-0001.bin>


More information about the users mailing list