GNU ld --wrap limitations

Tue Jan 15 06:54:37 UTC 2019

On 14/01/2019 23:33, Chris Johns wrote:
> On 14/1/19 8:22 pm, Sebastian Huber wrote:
>> while testing the event recording with the libbsd I noticed a GNU ld --wrap
>> limitation:
>>
>> https://www.sourceware.org/ml/binutils/2018-12/msg00210.html
>>
> I have been watching the thread. There is a limit to what binutils or any method
> can do as compiler technology improves.
>
> An example we currently build RTEMS with a single C file on the command line, I
> wonder what RTEMS's score would look like if all C files are passed to the
> compiler at once and it can optimise over all files as if included in a single
> source file. A number of externals we current have would not be visible and
> traceable using this method.

If I use -flto in my simple test case, then the wrapping via LD doesn't 
work at all.

>
>> It turned out that the wrapping doesn't work for references internal to a
>> translation unit.
> The reach for this issue is changing as the push to better optimise the
> generated code. If the compiler can remove or optimise an external call as an
> internally reference it will.

This is not a compiler optimization issue. The wrapping doesn't work 
with -O0 for all references internal to the translation unit. For example:

cat f.c
#include "f.h"

#include <stdio.h>

void h(void)
{
         puts(__PRETTY_FUNCTION__);
}

func f(void)
{
         h();
         puts(__PRETTY_FUNCTION__);
         return g;
}

cat f.s
         .file   "f.c"
         .text
         .globl  h
         .type   h, @function
h:
.LFB0:
         .cfi_startproc
         pushq   %rbp
         .cfi_def_cfa_offset 16
         .cfi_offset 6, -16
         movq    %rsp, %rbp
         .cfi_def_cfa_register 6
         movl    $__PRETTY_FUNCTION__.2272, %edi
         call    puts
         nop
         popq    %rbp
         .cfi_def_cfa 7, 8
         ret
         .cfi_endproc
.LFE0:
         .size   h, .-h
         .globl  f
         .type   f, @function
f:
.LFB1:
         .cfi_startproc
         pushq   %rbp
         .cfi_def_cfa_offset 16
         .cfi_offset 6, -16
         movq    %rsp, %rbp
         .cfi_def_cfa_register 6
         call    h
         movl    $__PRETTY_FUNCTION__.2276, %edi
         call    puts
         movl    $g, %eax
         popq    %rbp
         .cfi_def_cfa 7, 8
         ret
         .cfi_endproc
.LFE1:
         .size   f, .-f
         .section        .rodata
         .type   __PRETTY_FUNCTION__.2272, @object
         .size   __PRETTY_FUNCTION__.2272, 2
__PRETTY_FUNCTION__.2272:
         .string "h"
         .type   __PRETTY_FUNCTION__.2276, @object
         .size   __PRETTY_FUNCTION__.2276, 2
__PRETTY_FUNCTION__.2276:
         .string "f"
         .ident  "GCC: (SUSE Linux) 7.4.0"
         .section        .note.GNU-stack,"", at progbits

You see "call h" and "call puts". The h() function is defined in the 
translation unit. This call is not wrapped.

>
>> My hope was that the RTEMS Trace Linker doesn't have this
>> limitation, but the documentation says (user manual):
>>
>> "The trace linker’s major role is to wrap functions in the existing executable
>> with trace code. The
>> directions on how to wrap application functions is provided by the generator
>> configuration. The
>> wrapping function uses a GNU linker option called –wrap=symbol."
>>
> https://devel.rtems.org/wiki/Developer/Tracing/Trace_Linker#Limitation
>
> ... highlights the need for an external reference.

It says

"Functions must have external linkage to allow the linker to wrap the 
symbol."

this is not the same as

"highlights the need for an external reference"

You need an undefined reference to a symbol. References inside a 
translation unit are apparently not undefined references.

>
>> In the libbsd a lot of things are done through function pointer assignments, e.g.
>>
>> static struct netisr_handler ip_nh = {
>>      .nh_name = "ip",
>>      .nh_handler = ip_input,
>>      .nh_proto = NETISR_IP,
>> #ifdef    RSS
>>      .nh_m2cpuid = rss_soft_m2cpuid_v4,
>>      .nh_policy = NETISR_POLICY_CPU,
>>      .nh_dispatch = NETISR_DISPATCH_HYBRID,
>> #else
>>      .nh_policy = NETISR_POLICY_FLOW,
>> #endif
>> };
>>
>> or
>>
>> /*
>>   * Perform common duties while attaching to interface list
>>   */
>> void
>> ether_ifattach(struct ifnet *ifp, const u_int8_t *lla)
>> {
>>      int i;
>>      struct ifaddr *ifa;
>>      struct sockaddr_dl *sdl;
>>
>>      ifp->if_addrlen = ETHER_ADDR_LEN;
>>      ifp->if_hdrlen = ETHER_HDR_LEN;
>>      if_attach(ifp);
>>      ifp->if_mtu = ETHERMTU;
>>      ifp->if_output = ether_output;
>>      ifp->if_input = ether_input;
>>
>> This makes the tracing quite ineffective in this area.
>>
> I suspect the compiler is using a local offset to the code in the file. There
> are other cases, for example C++.
>
> I have recently been considering the role libdl can place in hooking trace code
> and the effect of deferring the ability to wrap to the target. I suspect it
> would not resolve the problem you face because there are no reloc records to the
> internal offsets being used but I have not checked. If the DWARF info holds call
> or block data maybe hot patching the code might be possible. This would need
> host processing to extract the hot patch data.
>
> I consider the wrap method of tracing as a low cost portable API tracer that is
> useful for things like malloc/free. It is not like a hardware trace device that
> can see everything so I consider there exists a cost/functionality curve.

It took me a while to figure out why the wrapping of ether_input() and 
ether_output() didn't work. I tried to improve the LD documentation a 
bit as a result.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.