memcpy performance

Tue Dec 9 18:03:28 UTC 1997

It's even worse than just a byte-by-byte copy!

On the 971024 snapshot (gen68360 BSP) a call to memcpy produces:
	1) A call to bcopy
	2) The bcopy routine links a stack frame and calls memmove
	3) The memmove routine:
		a) links a stack frame
		b) checks for overlap
		c) does a byte-by-byte copy
		   5 instructions/byte on a CPU32 processor!

There's a heck a of a lot of unnecessary code here:
	Two extra function calls
	Two extra stack frames
	Extra code to check for overlap
	A very inefficient loop

Processor-independent improvements required:
	1) There should be an explicit memcpy routine.
	2) The library should be compiled with aggressive optimization.

Processor-dependent improvements that would be nice:	
M68k - The loop in memmove should be done in such a way that  
processors like the CPU32 can go into loop mode.

Now all we need is a willing volunteer......

---
Eric Norum                                 eric at skatter.usask.ca
Saskatchewan Accelerator Laboratory        Phone: (306) 966-6308
University of Saskatchewan                 FAX:   (306) 966-6058
Saskatoon, Canada.