RFC: SSE and Altivec support
strauman at slac.stanford.edu
Wed Oct 28 15:55:40 UTC 2009
I did some tests on a mpc7457 and a 1GHz celeron M processor
and I found that:
- saving or restoring volatile vector registers (v0..v19) can
be achieved in ~1us w/o the memory area holding the register
contents nor the instructions used for saving/restoring
being present in the cache. With cache-hits this is even
a bit faster (factor 4-5).
- saving or restoring XMM and FPU context (fxsave/fxrstor)
on the celeron can be done in ~0.4us.
Based on these encouraging results I thought about adding
XMM / AltiVec support using the following simple strategy:
On i386 all FPU and SSE registers are volatile, probably
with the exception of the control registers (which define rounding
and exception behavior etc -- I found no sysv ABI addon mentioning
SSE; the i386 ABI mentions the FPCR but doesn't specify if it
is volatile or not).
Hence, I think it is enough for the ordinary context-switching
code to just save/restore the FP control word and the MXCSR.
When handling exceptions or interrupts, it would be necessary
to save/restore the full FPU and SSE context:
The altivec sysv ABI declares v0..v19 as volatile and v20..v31
and the vcsr as non-volatile.
Hence, it should be enough for ordinary context-switching code
to just save/restore v20..v31 + vcsr and save/restore
the volatile registers before/after calling C-code from the
exception/interrupt handling code in assembly.
(FP context switching code should be adapted to fit this
strategy; save/restore non-volatile regs in Context_Switch_fp()
and save/restore volatile regs around C-code called from
The beauty of this approach lies IMO in its simplicity and
ability to deal with gcc using the vector extensions wherever
it chooses to (even in ISRs).
The user doesn't have to fiddle with gcc options but can
just build everything with vector-support enabled.
More information about the users