memcpy performance
Eric Norum
eric at skatter.USask.Ca
Tue Dec 9 18:03:28 UTC 1997
It's even worse than just a byte-by-byte copy!
On the 971024 snapshot (gen68360 BSP) a call to memcpy produces:
1) A call to bcopy
2) The bcopy routine links a stack frame and calls memmove
3) The memmove routine:
a) links a stack frame
b) checks for overlap
c) does a byte-by-byte copy
5 instructions/byte on a CPU32 processor!
There's a heck a of a lot of unnecessary code here:
Two extra function calls
Two extra stack frames
Extra code to check for overlap
A very inefficient loop
Processor-independent improvements required:
1) There should be an explicit memcpy routine.
2) The library should be compiled with aggressive optimization.
Processor-dependent improvements that would be nice:
M68k - The loop in memmove should be done in such a way that
processors like the CPU32 can go into loop mode.
Now all we need is a willing volunteer......
---
Eric Norum eric at skatter.usask.ca
Saskatchewan Accelerator Laboratory Phone: (306) 966-6308
University of Saskatchewan FAX: (306) 966-6058
Saskatoon, Canada.
More information about the users
mailing list