Inlined code

Tue Aug 7 15:25:09 UTC 2018

On Tue, Aug 7, 2018 at 12:48 AM, Sebastian Huber <
sebastian.huber at embedded-brains.de> wrote:

> On 06/08/18 21:14, Joel Sherrill wrote:
>
>
>>
>> On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chrisj at rtems.org <mailto:
>> chrisj at rtems.org>> wrote:
>>
>>     On 06/08/2018 16:12, Christian Mauderer wrote:
>>     > Am 06.08.2018 um 07:31 schrieb Chris Johns:
>>     >> On 06/08/2018 10:51, Chris Johns wrote:
>>     >>> On 05/08/2018 19:39, Christian Mauderer wrote:
>>     >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns:
>>     >>>>> Hi,
>>     >>>>>
>>     >>>>> I have been working on migrating covoar in the rtems-tools
>>     repo to DWARF. The
>>     >>>>> goal is remove objdump parsing and to get accurate details
>>     about the functions
>>     >>>>> being covered. This is an unfunded task.
>>     >>>>>
>>     >>>>> The work has resulted in a close examination of inlined code
>>     in RTEMS and what I
>>     >>>>> saw alarmed me so I have added a report to the rtems-exeinfo
>>     tool in rtems-tools
>>     >>>>> (the change is to be posted for review once I get the
>>     coverage tests running).
>>     >>>>>
>>     >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is:
>>     >>>>>
>>     >>>>> inlined funcs   : 1412
>>     >>>>>     total funcs : 1956
>>     >>>>>  % inline funcs : 72%
>>     >>>>>      total size : 174616
>>     >>>>>     inline size : 81668
>>     >>>>>   % inline size : 46%
>>     >>>>>
>>     >>>>> This is a small application so it could be argued that skews
>>     the figures. A
>>     >>>>> large C/C++ application built with -O2 running on RTEMS 4.11
>>     ARM reports the
>>     >>>>> inline usage as:
>>     >>>>>
>>     >>>>> inlined funcs   : 10370
>>     >>>>>     total funcs : 17700
>>     >>>>>  % inline funcs : 58%
>>     >>>>>      total size : 3066240
>>     >>>>>     inline size : 1249514
>>     >>>>>   % inline size : 40%
>>     >>>>>
>>     >>>>> This does not seem right to me.
>>     >>>>>
>>     >>>>> The report is new and there could be issues in the DWARF
>>     handling that feeds
>>     >>>>> this report however I am posting this to start a discussion
>>     on the topic of
>>     >>>>> inlining.
>>     >>>>>
>>     >>>>> I attach the report for hello.exe. The `-i` option generates
>>     the inline report.
>>     >>>>>
>>     >>>>> The first section is a summary showing the total number of
>>     functions in the
>>     >>>>> executable that have machine code and are flagged as inline.
>>     The report lists
>>     >>>>> the percentage of functions that are inlined and the
>>     percentage of machine code
>>     >>>>> that is inlined. The values seem high to me.
>>     >>>>>
>>     >>>>> The second table lists inline functions that are repeated
>>     sorted from the
>>     >>>>> largest foot print to the smallest. The first column the
>>     total size of machine
>>     >>>>> code in the executable and the second column the number of
>>     instances.
>>     >>>>>
>>     >>>>> The third table is the list of inline functions sorted from
>>     largest machine code
>>     >>>>> footprint to smallest. The second column are flags of which
>>     there is one. A `E`
>>     >>>>> indicates the inline function is also external which means
>>     the compiler has
>>     >>>>> created an external reference to this function, ie an
>>     address-of is being taken.
>>     >>>>> The third column is the address in the executable so you can
>>     take a look with
>>     >>>>> objdump at the machine code.
>>     >>>>>
>>     >>>>> We need to ask some important question in relation to
>>     inlining. It is cheap to
>>     >>>>> add and we all feel the code we add needs to be fast and
>>     needs to be inlined but
>>     >>>>> does it really need to be inlined?
>>     >>>>>
>>     >>>>> Some pieces of code do need to be inlined and the overhead
>>     is just that an
>>     >>>>> overhead, for example in the large C/C++ application there
>>     is a low level
>>     >>>>> volatile hardware write routing with close to 300 instances
>>     and a code size of
>>     >>>>> 10K. This code needs to be inlined for performance reasons
>>     but should the size
>>     >>>>> on average be 40 bytes, I doubt it.
>>     >>>>>
>>     >>>>> Can we be more judicious with our use of the inline keyword?
>>     >>>>>
>>     >>>>> Is the performance gain we really expect or is the actual
>>     overhead of a call
>>     >>>>> frame not worth saving?
>>     >>>>>
>>     >>>>> What are the real costs of inlining a piece of code? It adds
>>     size to the
>>     >>>>> executable and depending on the code being inlined it
>>     complicates coverage
>>     >>>>> analysis be adding extra branch points.
>>     >>>>>
>>     >>>>> The metrics to determine what should be inlined is
>>     complicated and I do not
>>     >>>>> think we have a suitable policy in place. I believe it is
>>     time we to create one.
>>     >>>>>
>>     >>>>> The issue is not limited to our code, gcc, newlib and
>>     libstdc++ seem to have
>>     >>>>> some code that should be looked at more closely. For example
>>     __udivmoddi4, and
>>     >>>>> __sprint_r.
>>     >>>>>
>>     >>>>> Chris
>>     >>>>>
>>     >>>>>
>>     >>>>
>>     >>>> Hello Chris,
>>     >>>>
>>     >>>> I just took a look at one of the first function in your list:
>>     __sprint_r
>>     >>>>
>>     >>>>
>>     https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=
>> blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462
>> ecccec97c8e901e4ffadd44;hb=HEAD#l403
>>     <https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;
>> a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462
>> ecccec97c8e901e4ffadd44;hb=HEAD#l403>
>>     >>>>
>>     >>>> As far as I can see, there is no explicit inline key word for
>>     that
>>     >>>> function. So in that case, the compiler decided that it would
>>     be a good
>>     >>>> idea to inline that function.
>>     >>>
>>     >>> Thanks and yes. At this point in time I cannot tell what is
>>     happening and I am
>>     >>> not sure the tool is reporting accurate data, I need to
>>     investigate.
>>     >>
>>     >> I have updated the tool and report to show which inline
>>     functions are:
>>     >>
>>     >>  - inlined by compiler
>>     >>  - declared inline and not inlined
>>     >>  - declared inline and inlined
>>     >>
>>     >> I have also fixed a quick hack I had where the size was the
>>     span from the low PC
>>     >> to the high PC, this was wrong. Inlined code can be split and
>>     moved when
>>     >> inlining creating a discontinuous address range. The size in
>>     the report is now
>>     >> the number of machine code bytes.
>>     >>
>>     >> The report will show any functions not inlined when asked to be
>>     inlined. We do
>>     >> not have any.
>>     >>
>>     >> The 'C' flag in the inlined table shows which functions the
>>     compiler has inlined.
>>     >>
>>     >> Chris
>>     >>
>>     >
>>     > With that list it is now much clearer which functions would be
>>     relevant
>>     > for a potential review.
>>
>>     Yes. FYI the C/C++ RTEMS 4.11 app now gives:
>>
>>     inlined funcs   : 10370
>>         total funcs : 17700
>>      % inline funcs : 58%
>>          total size : 2296354
>>         inline size : 479628
>>       % inline size : 20%
>>
>>     This level of reduction is more inline with what I expected.
>>
>>     >
>>     >>>
>>     >>>> I'm not sure whether I might just haven't seen it but is there a
>>     >>>> possibility to distinguish between functions that have been
>>     inlined by
>>     >>>> the compiler and ones that have been inlined due to the
>>     "inline" keyword
>>     >>>> without looking at every definition?
>>     >>>
>>     >>> I am not sure. The DWARF data is complex and detailed and I
>>     view this initial
>>     >>> step into the area of using DWARF to perform static analysis
>>     of RTEMS
>>     >>> executables as green.
>>     >>>
>>     >>> DWARF does provide declaration attributes. I need to review
>>     the DWARF data and
>>     >>> standard to determine if we can tell what is declared inline
>>     and what has been
>>     >>> inlined. I think it would be good to know.
>>     >>>
>>     >>>> Did you try compiling with size optimization? I would expect
>>     that the
>>     >>>> compiler would inline far less functions and maybe even
>>     ignore some
>>     >>>> "inline" keywords. As far as I know it's more of a hint to
>>     the compiler.
>>     >>>
>>     >>> Not yet. A complete tool build with those options is a lot of
>>     effort and I am
>>     >>> still not comfortable the report is accurate. I think this is
>>     something that
>>     >>> should be done at some point. I think it would create an
>>     interesting data point.
>>     >>>
>>     >>>> I would only worry about functions that are still inlined if size
>>     >>>> optimization is selected.
>>     >>>
>>     >>> I think we need to review the functions we currently have
>>     tagged as inline. I
>>     >>> think the only way we can do this is with real data.
>>     >>>
>>     >>>> That's the case when I tell the compiler to
>>     >>>> make the program as small as possible. In all other cases I
>>     want some
>>     >>>> well balanced optimum between speed and size. Inlining small
>>     functions
>>     >>>> is OK in that case if you ask me.
>>     >>>
>>     >>> How do you define this, ie what is the inline policy we use?
>>     >>> How do you audit this?
>>     >
>>     > Both questions are not simple to answer. It is most likely a case by
>>     > case decision. I think there are roughly two reasons for inlining:
>>     >
>>     > - The code is short enough that it is smaller if it is inlined
>>     compared
>>     > to a function call.
>>     >
>>     > - There is some performance reason.
>>     >
>>     > Anything else?
>>
>>     Not for inlining rather for not inlining? I had a discussion with
>>     Joel last week
>>     about coverage and he said he had previously reviewed the inlines
>>     to reduce the
>>     branch counts because it complicates coverage. I would prefer he
>>     talks about
>>     this, it is his area.
>>
>>
>> Thanks for the lead-in Chris.
>>
>> First, there are multiple types of coverage. Statement, Decision coverage,
>> MC/DC, and object coverage. Based on various limiting factors which I
>> am happy to discuss in more detail if someone is interested, we went
>> with object coverage. I will briefly remind everyone of the limitations
>> and guidelines we went into this with:
>>
>> + We didn't want instrumented code. Test as you fly, fly as you test.
>> Plus there would need to be a way to get information generated by
>> the instrumentation off the target.
>>
>> + Without gcc support for identifying sub-expressions in complex
>> logical expressions, there is no way to do DC or MCDC. Further,
>> we would have had to have a tool to interpret the non-existent
>> information and know which entries in the truth tables for every
>> expression had to be exercised for DC and MCDC.
>>
>> + gcov's reports are primarily statement coverage. And you can't get
>> gcov reports without gcov output from runs which we still are working
>> on. Also in terms of DO-178, statement coverage is generally Level C
>> type coverage.
>>
>> We wanted higher level coverage for RTEMS and thus object
>> coverage with emphasis on every generated branch statement being
>> needed and exercisable as taken and not taken. This approach
>> ensured we didn't have dead code.
>>
>> Plus this type of information could easily be obtained from simulator
>> traces from Qemu and tsim. covoar was written to combine and
>> correlate the information back to assembly and source and generate
>> the reports.
>>
>> That's what led to the current coverage approach. Chris has been
>> working to eliminate the use of command line tools (e.g. nm, objdump,
>> etc) in favor of using DWARF directly.
>>
>> That's the background of the approach and how we got reports.
>>
>> Also remember that RTEMS testing is primarily via the public APIs
>> which combined with coverage ensures we actually need all the
>> internal code. If you can't reach it from a public API, it is dead.
>>
>> But the insights into RTEMS coding practices began when we started
>> trying to improve the coverage percentages. Key point:
>>
>> IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO
>> BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION.
>>
>> At the source level, sometimes we factored out a common helper method
>> to avoid testing the same logic via multiple APIs. This reduced the number
>> of test cases needed.
>>
>> We looked at a lot of generated assembly. Sometimes we would see
>> large methods being inlined multiple times. This would increase the
>> overall
>> size of an RTEMS application. But size is not the only impact of inlining.
>> If an inlined method has one or more if's in it, then the branch paths
>> it includes are introduced EVERYWHERE it is inlined. When we
>> had _Thread_Dispatch_enable, I recall it was used > 100 times and
>> includes a branch in it. There was a build option to not inline this
>> routine to avoid needing to add over 100 test cases.
>>
>> There is a delicate code size and complexity dance when deciding
>> whether to inline something or not. There isn't an easy or obvous
>> answer.
>>
>> + How large is the method?
>>
>> + How many times is it used?
>>
>> + How many branches/loops does it have (e.g. cyclomatic complexity)?
>>
>> In general, very small methods with no branches can be inlined in less
>> code than the subroutine call takes. So these tend to be a win.
>>
>> Over a certain size, the number of times used multiplied by the inlined
>> size can have a negative impact on code size. We saw this a lot when
>> doing the code coverage improvements.
>>
>> If the inlined method has branches, is it possible or cost effective
>> to write test cases for the branches introduced at EVERY use?
>>
>> I came away believing a few things:
>>
>> + The MISRA rule for using simple, single condition branch logic
>> is really good for code coverage analysis. It simplifies it. Statement
>> coverage and branch coverage become closer to equivalent. The
>> resulting assembly code is also much easier to match to source.
>>
>> + Inlining a method with conditional logic has to be looked at
>> very carefully. You don't want to inline something and find yourself
>> having exploded the required number of test cases. Again, look
>> at the number of times a method is used but be cautious about
>> conditional expressions in inline methods.
>>
>> + Inlining doesn't always improve performance measurably. It
>> can increase code size, code to branch around, etc.
>>
>> For a current example of something that is very likely a problem
>> from a object code test coverage and code size increase view,
>> consider _Thread_Dispatch_enable.  It is inlined 41 times.
>> Chris and I looked at a place where it was inlined. It w
>> <https://maps.google.com/?q=ed+at+a+place+where+it+was+inlined.+It+w&entry=gmail&source=g>as
>> about
>> 80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would
>> save ~3K across the entire RTEMS API object code to not inline it
>> at all.
>>
>> Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable().
>> From Wikipedia:
>>
>> Cyclomatic complexity is a software metric, used to indicate the
>> complexity
>> of a program. It is a quantitative measure of the number of linearly
>> independent
>> paths through a program's source code.
>>
>> I used http://www.lizard.ws/ [1] to analyse this method. If you take the
>> conditional
>> compilation out of the code, the cyclomatic complexity is 4. This means
>> that
>> every place this is inlined, it introduces 3 extra paths to test. 41*3
>> ==> 123.
>>
>> My recommendation would be to evaluate the impact of not inlining this
>> method at all. The negative impact on testing and code size are pretty
>> high.
>>
>> Unfortunately, what I wrote up is just the guidelines. Things change over
>> time. An inlined method that is a nice encapsulation when written could
>> turn out to be too large to be inlined if 10 other use cases get added.
>>
>> I lean to a method can be inlined if it is small and no more than one
>> branch statement in assembly. You can be more lenient if used only
>> once or twice. But lean to NO branches on an inlined method if the
>> number of uses is high.
>>
>> Ideally the inlined code is no larger than the code for the call itself
>> and there are no branches. But going above that is not a reason to
>> not inline.
>>
>> I know this was long but there are a lot of trade-offs. We want RTEMS to
>> be as small as possible and as testable as possible. We don't have an
>> army of people to write test cases. And I don't want to need that.
>>
>> Also I have focused on the things that the RTEMS community has control
>> over. Chris' tool gives us a broader view and we can likely help improve
>> things more broadly.
>>
>> [1] You can cut and paste code into this via the web. So pretty easy to
>> look at a single method.
>>
>
> I think all this information should be moved to a proper RTEMS
> contributors/engineering guide.

+1

https://devel.rtems.org/wiki/TBR/UserManual/RTEMS_Coverage_Analysis probably
needs scrubbing and merged into a Rest document.

It has some good stuff but some of the procedural stuff is out of date.

--joel

>
>
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax     : +49 89 189 47 41-09
> E-Mail  : sebastian.huber at embedded-brains.de
> PGP     : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20180807/876bb21c/attachment-0002.html>