memcpy performance

Tue Dec 9 18:24:00 UTC 1997

I have forwarded this to the newlib maintainers list for comments.  

I have already been told there is a new and improved set of portable
memory functions in the current newlib source.

And before anyone asks .. no I don't have it either. :)

--joel
Joel Sherrill                    Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available             (205) 722-9985

On Tue, 9 Dec 1997, Eric Norum wrote:

> It's even worse than just a byte-by-byte copy!
> 
> On the 971024 snapshot (gen68360 BSP) a call to memcpy produces:
> 	1) A call to bcopy
> 	2) The bcopy routine links a stack frame and calls memmove
> 	3) The memmove routine:
> 		a) links a stack frame
> 		b) checks for overlap
> 		c) does a byte-by-byte copy
> 		   5 instructions/byte on a CPU32 processor!
> 		
> There's a heck a of a lot of unnecessary code here:
> 	Two extra function calls
> 	Two extra stack frames
> 	Extra code to check for overlap
> 	A very inefficient loop
> 
> Processor-independent improvements required:
> 	1) There should be an explicit memcpy routine.
> 	2) The library should be compiled with aggressive optimization.
> 	
> Processor-dependent improvements that would be nice:	
> M68k - The loop in memmove should be done in such a way that  
> processors like the CPU32 can go into loop mode.
> 
> Now all we need is a willing volunteer......
> 
> ---
> Eric Norum                                 eric at skatter.usask.ca
> Saskatchewan Accelerator Laboratory        Phone: (306) 966-6308
> University of Saskatchewan                 FAX:   (306) 966-6058
> Saskatoon, Canada.
>