RFC: SSE and Altivec support

Till Straumann strauman at slac.stanford.edu
Wed Oct 28 15:55:40 UTC 2009


I did some tests on a mpc7457 and a 1GHz celeron M processor
and I found that:

- saving or restoring volatile vector registers (v0..v19) can
  be achieved in ~1us w/o the memory area holding the register
  contents nor the instructions used for saving/restoring
  being present in the cache. With cache-hits this is even
  a bit faster (factor 4-5).

- saving or restoring XMM and FPU context (fxsave/fxrstor)
  on the celeron can be done in ~0.4us.

Based on these encouraging results I thought about adding
XMM / AltiVec support using the following simple strategy:

i386:
  On i386 all FPU and SSE registers are volatile, probably
  with the exception of the control registers (which define rounding
  and exception behavior etc -- I found no sysv ABI addon mentioning
  SSE; the i386 ABI mentions the FPCR but doesn't specify if it
  is volatile or not).

  Hence, I think it is enough for the ordinary context-switching
  code to just save/restore the FP control word and the MXCSR.

  When handling exceptions or interrupts, it would be necessary
  to save/restore the full FPU and SSE context:

  handle_interrupt_in_assembly:
        save_GPRs
        ...
        align_stack
        ...
        fxsave
        call_C_code
        fxrstor
        ...

PPC:
  The altivec sysv ABI declares v0..v19 as volatile and v20..v31
  and the vcsr as non-volatile.

  Hence, it should be enough for ordinary context-switching code
  to just save/restore v20..v31 + vcsr and save/restore
  the volatile registers before/after calling C-code from the
  exception/interrupt handling code in assembly.

  (FP context switching code should be adapted to fit this
  strategy; save/restore non-volatile regs in Context_Switch_fp()
  and save/restore volatile regs around C-code called from
  exception context).

The beauty of this approach lies IMO in its simplicity and
ability to deal with gcc using the vector extensions wherever
it chooses to (even in ISRs).
The user doesn't have to fiddle with gcc options but can
just build everything with vector-support enabled.

RFC
-- Till



More information about the users mailing list