Inlined code

Mon Aug 6 06:21:34 UTC 2018

On 06/08/2018 16:12, Christian Mauderer wrote:
> Am 06.08.2018 um 07:31 schrieb Chris Johns:
>> On 06/08/2018 10:51, Chris Johns wrote:
>>> On 05/08/2018 19:39, Christian Mauderer wrote:
>>>> Am 05.08.2018 um 04:00 schrieb Chris Johns:
>>>>> Hi,
>>>>>
>>>>> I have been working on migrating covoar in the rtems-tools repo to DWARF. The
>>>>> goal is remove objdump parsing and to get accurate details about the functions
>>>>> being covered. This is an unfunded task.
>>>>>
>>>>> The work has resulted in a close examination of inlined code in RTEMS and what I
>>>>> saw alarmed me so I have added a report to the rtems-exeinfo tool in rtems-tools
>>>>> (the change is to be posted for review once I get the coverage tests running).
>>>>>
>>>>> A summary report for hello.exe on RTEMS 5 for SPARC is:
>>>>>
>>>>> inlined funcs   : 1412
>>>>>     total funcs : 1956
>>>>>  % inline funcs : 72%
>>>>>      total size : 174616
>>>>>     inline size : 81668
>>>>>   % inline size : 46%
>>>>>
>>>>> This is a small application so it could be argued that skews the figures. A
>>>>> large C/C++ application built with -O2 running on RTEMS 4.11 ARM reports the
>>>>> inline usage as:
>>>>>
>>>>> inlined funcs   : 10370
>>>>>     total funcs : 17700
>>>>>  % inline funcs : 58%
>>>>>      total size : 3066240
>>>>>     inline size : 1249514
>>>>>   % inline size : 40%
>>>>>
>>>>> This does not seem right to me.
>>>>>
>>>>> The report is new and there could be issues in the DWARF handling that feeds
>>>>> this report however I am posting this to start a discussion on the topic of
>>>>> inlining.
>>>>>
>>>>> I attach the report for hello.exe. The `-i` option generates the inline report.
>>>>>
>>>>> The first section is a summary showing the total number of functions in the
>>>>> executable that have machine code and are flagged as inline. The report lists
>>>>> the percentage of functions that are inlined and the percentage of machine code
>>>>> that is inlined. The values seem high to me.
>>>>>
>>>>> The second table lists inline functions that are repeated sorted from the
>>>>> largest foot print to the smallest. The first column the total size of machine
>>>>> code in the executable and the second column the number of instances.
>>>>>
>>>>> The third table is the list of inline functions sorted from largest machine code
>>>>> footprint to smallest. The second column are flags of which there is one. A `E`
>>>>> indicates the inline function is also external which means the compiler has
>>>>> created an external reference to this function, ie an address-of is being taken.
>>>>> The third column is the address in the executable so you can take a look with
>>>>> objdump at the machine code.
>>>>>
>>>>> We need to ask some important question in relation to inlining. It is cheap to
>>>>> add and we all feel the code we add needs to be fast and needs to be inlined but
>>>>> does it really need to be inlined?
>>>>>
>>>>> Some pieces of code do need to be inlined and the overhead is just that an
>>>>> overhead, for example in the large C/C++ application there is a low level
>>>>> volatile hardware write routing with close to 300 instances and a code size of
>>>>> 10K. This code needs to be inlined for performance reasons but should the size
>>>>> on average be 40 bytes, I doubt it.
>>>>>
>>>>> Can we be more judicious with our use of the inline keyword?
>>>>>
>>>>> Is the performance gain we really expect or is the actual overhead of a call
>>>>> frame not worth saving?
>>>>>
>>>>> What are the real costs of inlining a piece of code? It adds size to the
>>>>> executable and depending on the code being inlined it complicates coverage
>>>>> analysis be adding extra branch points.
>>>>>
>>>>> The metrics to determine what should be inlined is complicated and I do not
>>>>> think we have a suitable policy in place. I believe it is time we to create one.
>>>>>
>>>>> The issue is not limited to our code, gcc, newlib and libstdc++ seem to have
>>>>> some code that should be looked at more closely. For example __udivmoddi4, and
>>>>> __sprint_r.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>
>>>> Hello Chris,
>>>>
>>>> I just took a look at one of the first function in your list: __sprint_r
>>>>
>>>> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e901e4ffadd44;hb=HEAD#l403
>>>>
>>>> As far as I can see, there is no explicit inline key word for that
>>>> function. So in that case, the compiler decided that it would be a good
>>>> idea to inline that function.
>>>
>>> Thanks and yes. At this point in time I cannot tell what is happening and I am
>>> not sure the tool is reporting accurate data, I need to investigate.
>>
>> I have updated the tool and report to show which inline functions are:
>>
>>  - inlined by compiler
>>  - declared inline and not inlined
>>  - declared inline and inlined
>>
>> I have also fixed a quick hack I had where the size was the span from the low PC
>> to the high PC, this was wrong. Inlined code can be split and moved when
>> inlining creating a discontinuous address range. The size in the report is now
>> the number of machine code bytes.
>>
>> The report will show any functions not inlined when asked to be inlined. We do
>> not have any.
>>
>> The 'C' flag in the inlined table shows which functions the compiler has inlined.
>>
>> Chris
>>
> 
> With that list it is now much clearer which functions would be relevant
> for a potential review.

Yes. FYI the C/C++ RTEMS 4.11 app now gives:

inlined funcs   : 10370
    total funcs : 17700
 % inline funcs : 58%
     total size : 2296354
    inline size : 479628
  % inline size : 20%

This level of reduction is more inline with what I expected.

> 
>>>
>>>> I'm not sure whether I might just haven't seen it but is there a
>>>> possibility to distinguish between functions that have been inlined by
>>>> the compiler and ones that have been inlined due to the "inline" keyword
>>>> without looking at every definition?
>>>
>>> I am not sure. The DWARF data is complex and detailed and I view this initial
>>> step into the area of using DWARF to perform static analysis of RTEMS
>>> executables as green.
>>>
>>> DWARF does provide declaration attributes. I need to review the DWARF data and
>>> standard to determine if we can tell what is declared inline and what has been
>>> inlined. I think it would be good to know.
>>>
>>>> Did you try compiling with size optimization? I would expect that the
>>>> compiler would inline far less functions and maybe even ignore some
>>>> "inline" keywords. As far as I know it's more of a hint to the compiler.
>>>
>>> Not yet. A complete tool build with those options is a lot of effort and I am
>>> still not comfortable the report is accurate. I think this is something that
>>> should be done at some point. I think it would create an interesting data point.
>>>
>>>> I would only worry about functions that are still inlined if size
>>>> optimization is selected. 
>>>
>>> I think we need to review the functions we currently have tagged as inline. I
>>> think the only way we can do this is with real data.
>>>
>>>> That's the case when I tell the compiler to
>>>> make the program as small as possible. In all other cases I want some
>>>> well balanced optimum between speed and size. Inlining small functions
>>>> is OK in that case if you ask me.
>>>
>>> How do you define this, ie what is the inline policy we use?
>>> How do you audit this?
> 
> Both questions are not simple to answer. It is most likely a case by
> case decision. I think there are roughly two reasons for inlining:
> 
> - The code is short enough that it is smaller if it is inlined compared
> to a function call.
> 
> - There is some performance reason.
> 
> Anything else?

Not for inlining rather for not inlining? I had a discussion with Joel last week
about coverage and he said he had previously reviewed the inlines to reduce the
branch counts because it complicates coverage. I would prefer he talks about
this, it is his area.

Chris