reviewing inline assembly memory constraints [asm tutorial]

Till Straumann strauman at slac.stanford.edu
Wed May 14 01:34:14 UTC 2008


I apologize for the long message but I'm trying to
summarize a few things about inline assembly
I believe I have learned  (or am in the process of
learning) which are nowhere readily documented AFAIK.


Inline assembly code is used in several places in
the RTEMS codebase.

Sometimes the assembly code manipulates memory
variables that are also used by embedding C-code and
precautions must be taken so that the compiler is
notified that the assembly code reads and/or writes
such variables.

Following the advice given in the gcc info page, critical
memory areas have been added as input and/or output
memory operands using the "m" constraint.

E.g., consider

int obscure_p_plus_one(int *p_ptr)
{
int x = *p_ptr;
 
    asm volatile("<assembly code incrementing x>":"=m"(x):"r"(&x),"m"(x));
 
    return x;
}


Without the memory input operand gcc doesn't know that the assembly
code requires *p to be written to x and the store operation might
be optimized away. Without the memory output operand gcc doesn't know
that the assembly modified 'x' and that it has to reload the
return value (see appendix for explanation).

The bad news is that I recently discovered that using "=m" / "m"
does *not* necessarily produce the desired results
(see this thread here:
http://gcc.gnu.org/ml/gcc/2008-03/msg00976.html
)

The reason is that "m" operands match autoincrement
addresses and gcc assumes that the asm code carries out
any side-effects associated with an autoincrement operation
IF gcc actually chooses to generate an autoincrement
address for such an operand -- but you really have no
way of knowing.

E.g., consider powerpc:

struct xxx {
   int a, b;
};

struct xxx *p;

If you tell gcc that your asm requires p->b by a

   asm volatile("..."::"m"(p->b))

then gcc MAY generate the memory access by means of
a  '<offset>(<base_reg>)' addressing mode and it MAY
EXPECT that the asm pre-updates the register as in

   lwzu rx, 4(base_reg)

and it MAY use the register it expects the asm to
modify further down the line!

Hence, the only way of correctly using "m" on PPC is

   asm volatile("lwz%U1%X1 %0, %1":"=r"(result):"m"(p->b));

which causes gcc to use a pre-update (%U) or indexed (%X)
addressing mode, depending on how the memory address is
generated. E.g., gcc could end up emitting

  lwzu r0, 4(r3)

and use the updated r3 further down the line. If your
asm code failed to update r3 as gcc expects it to do
wrong code can be the result!

The gcc people recommend using "o" instead of "m" which
does not match a PPC pre-update mode (but we don't really
know what it does on other machines with more complicated
addressing modes). Hence on PPC

   p->b = xxx;
    ...
   asm volatile(""::"o"(p->b))

should ensure that gcc writes p->b to memory prior to
the asm code accessing it plus gcc doesn't make any
assumptions about pre-updating registers.


Unfortunately, from what we originally wanted (just tell
gcc that the asm accesses a particular memory region) we
deviated into the guts of constraints and cpu-dependent
addressing modes etc.

A safer (but more intrusive) way of synchronizing
memory access with an 'asm' construct is a general 'memory'
clobber

  asm volatile("<do stuff here>":::"memory")

but not even that is w/o pitfalls! Consider

int y;

int blah()
{
  int x = 0;
     asm volatile("smart asm might modify x and y":::"memory");
     return x+y;
}

In this case, gcc will respect the memory clobber for the global
variable 'y' but it still believes there is no way for the asm
to modify the local/stack variable x and it will emit

    <asm code>
    return y;

unless you pass the address of x to the asm as an input operand -- in
that case, gcc assumes you may modify it:

int blah()
{
  int x = 0;
     asm volatile("smart asm modifies x and y"::"r"(&x):"memory");
     return x+y;
}

produces correct code.

CONCLUSION:
 - as gcc becomes more sophisticated writing correct inline asm code
   becomes more and more tricky. Unless you know exactly what you are
   doing and are intimately familiar with the CPU family in question
   (and gcc internals) you should consider avoiding inline asm.
 - In particular, inline asm that interacts with memory is very
   tricky. Declaring memory areas volatile and using 'memory' clobber
   is probably safer than trying to be smart with "m" and "o" etc.
 - Review of RTEMS inline assembly (especially use if "m", "=m") is
   highly recommended.

-- Till

Appendix:
--------
(explanation of first example -- however, this explanation
does not necessarily cover everything correctly. E.g., recent
powerpc-gcc-4.3 does NOT exactly what is described here; see above)

What is described here is what we would like the "m" and "=m"
constraints to mean and what the gcc info page seems to suggest.

In reality these gcc does more complex things...

That said, let's proceed assuming a better world:

Without the "=m" or "m" input operands gcc may produce the following code:

int obscure_p_plus_one(int *p_ptr)
{
int x;
    asm volatile("<increment x>":"r"(&x));

    return *p_ptr;
}

With only the "m" input operand it may produce

int obscure_p_plus_one(int *p_ptr)
{
int x = *p_ptr;

    asm volatile("<increment x>":"r"(&x),"m"(x));

    return *p_ptr;
}

Finally, with only the "=m" output operand it may produce

int obscure_p_plus_one(int *p_ptr)
{
int x;

    asm volatile("<increment x>":"=m"(x):"r"(&x));

    return x;
}




More information about the users mailing list