Inlined code
Joel Sherrill
joel at rtems.org
Tue Aug 7 15:25:09 UTC 2018
On Tue, Aug 7, 2018 at 12:48 AM, Sebastian Huber <
sebastian.huber at embedded-brains.de> wrote:
> On 06/08/18 21:14, Joel Sherrill wrote:
>
>
>>
>> On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chrisj at rtems.org <mailto:
>> chrisj at rtems.org>> wrote:
>>
>> On 06/08/2018 16:12, Christian Mauderer wrote:
>> > Am 06.08.2018 um 07:31 schrieb Chris Johns:
>> >> On 06/08/2018 10:51, Chris Johns wrote:
>> >>> On 05/08/2018 19:39, Christian Mauderer wrote:
>> >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I have been working on migrating covoar in the rtems-tools
>> repo to DWARF. The
>> >>>>> goal is remove objdump parsing and to get accurate details
>> about the functions
>> >>>>> being covered. This is an unfunded task.
>> >>>>>
>> >>>>> The work has resulted in a close examination of inlined code
>> in RTEMS and what I
>> >>>>> saw alarmed me so I have added a report to the rtems-exeinfo
>> tool in rtems-tools
>> >>>>> (the change is to be posted for review once I get the
>> coverage tests running).
>> >>>>>
>> >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is:
>> >>>>>
>> >>>>> inlined funcs : 1412
>> >>>>> total funcs : 1956
>> >>>>> % inline funcs : 72%
>> >>>>> total size : 174616
>> >>>>> inline size : 81668
>> >>>>> % inline size : 46%
>> >>>>>
>> >>>>> This is a small application so it could be argued that skews
>> the figures. A
>> >>>>> large C/C++ application built with -O2 running on RTEMS 4.11
>> ARM reports the
>> >>>>> inline usage as:
>> >>>>>
>> >>>>> inlined funcs : 10370
>> >>>>> total funcs : 17700
>> >>>>> % inline funcs : 58%
>> >>>>> total size : 3066240
>> >>>>> inline size : 1249514
>> >>>>> % inline size : 40%
>> >>>>>
>> >>>>> This does not seem right to me.
>> >>>>>
>> >>>>> The report is new and there could be issues in the DWARF
>> handling that feeds
>> >>>>> this report however I am posting this to start a discussion
>> on the topic of
>> >>>>> inlining.
>> >>>>>
>> >>>>> I attach the report for hello.exe. The `-i` option generates
>> the inline report.
>> >>>>>
>> >>>>> The first section is a summary showing the total number of
>> functions in the
>> >>>>> executable that have machine code and are flagged as inline.
>> The report lists
>> >>>>> the percentage of functions that are inlined and the
>> percentage of machine code
>> >>>>> that is inlined. The values seem high to me.
>> >>>>>
>> >>>>> The second table lists inline functions that are repeated
>> sorted from the
>> >>>>> largest foot print to the smallest. The first column the
>> total size of machine
>> >>>>> code in the executable and the second column the number of
>> instances.
>> >>>>>
>> >>>>> The third table is the list of inline functions sorted from
>> largest machine code
>> >>>>> footprint to smallest. The second column are flags of which
>> there is one. A `E`
>> >>>>> indicates the inline function is also external which means
>> the compiler has
>> >>>>> created an external reference to this function, ie an
>> address-of is being taken.
>> >>>>> The third column is the address in the executable so you can
>> take a look with
>> >>>>> objdump at the machine code.
>> >>>>>
>> >>>>> We need to ask some important question in relation to
>> inlining. It is cheap to
>> >>>>> add and we all feel the code we add needs to be fast and
>> needs to be inlined but
>> >>>>> does it really need to be inlined?
>> >>>>>
>> >>>>> Some pieces of code do need to be inlined and the overhead
>> is just that an
>> >>>>> overhead, for example in the large C/C++ application there
>> is a low level
>> >>>>> volatile hardware write routing with close to 300 instances
>> and a code size of
>> >>>>> 10K. This code needs to be inlined for performance reasons
>> but should the size
>> >>>>> on average be 40 bytes, I doubt it.
>> >>>>>
>> >>>>> Can we be more judicious with our use of the inline keyword?
>> >>>>>
>> >>>>> Is the performance gain we really expect or is the actual
>> overhead of a call
>> >>>>> frame not worth saving?
>> >>>>>
>> >>>>> What are the real costs of inlining a piece of code? It adds
>> size to the
>> >>>>> executable and depending on the code being inlined it
>> complicates coverage
>> >>>>> analysis be adding extra branch points.
>> >>>>>
>> >>>>> The metrics to determine what should be inlined is
>> complicated and I do not
>> >>>>> think we have a suitable policy in place. I believe it is
>> time we to create one.
>> >>>>>
>> >>>>> The issue is not limited to our code, gcc, newlib and
>> libstdc++ seem to have
>> >>>>> some code that should be looked at more closely. For example
>> __udivmoddi4, and
>> >>>>> __sprint_r.
>> >>>>>
>> >>>>> Chris
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> Hello Chris,
>> >>>>
>> >>>> I just took a look at one of the first function in your list:
>> __sprint_r
>> >>>>
>> >>>>
>> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=
>> blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462
>> ecccec97c8e901e4ffadd44;hb=HEAD#l403
>> <https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;
>> a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462
>> ecccec97c8e901e4ffadd44;hb=HEAD#l403>
>> >>>>
>> >>>> As far as I can see, there is no explicit inline key word for
>> that
>> >>>> function. So in that case, the compiler decided that it would
>> be a good
>> >>>> idea to inline that function.
>> >>>
>> >>> Thanks and yes. At this point in time I cannot tell what is
>> happening and I am
>> >>> not sure the tool is reporting accurate data, I need to
>> investigate.
>> >>
>> >> I have updated the tool and report to show which inline
>> functions are:
>> >>
>> >> - inlined by compiler
>> >> - declared inline and not inlined
>> >> - declared inline and inlined
>> >>
>> >> I have also fixed a quick hack I had where the size was the
>> span from the low PC
>> >> to the high PC, this was wrong. Inlined code can be split and
>> moved when
>> >> inlining creating a discontinuous address range. The size in
>> the report is now
>> >> the number of machine code bytes.
>> >>
>> >> The report will show any functions not inlined when asked to be
>> inlined. We do
>> >> not have any.
>> >>
>> >> The 'C' flag in the inlined table shows which functions the
>> compiler has inlined.
>> >>
>> >> Chris
>> >>
>> >
>> > With that list it is now much clearer which functions would be
>> relevant
>> > for a potential review.
>>
>> Yes. FYI the C/C++ RTEMS 4.11 app now gives:
>>
>> inlined funcs : 10370
>> total funcs : 17700
>> % inline funcs : 58%
>> total size : 2296354
>> inline size : 479628
>> % inline size : 20%
>>
>> This level of reduction is more inline with what I expected.
>>
>> >
>> >>>
>> >>>> I'm not sure whether I might just haven't seen it but is there a
>> >>>> possibility to distinguish between functions that have been
>> inlined by
>> >>>> the compiler and ones that have been inlined due to the
>> "inline" keyword
>> >>>> without looking at every definition?
>> >>>
>> >>> I am not sure. The DWARF data is complex and detailed and I
>> view this initial
>> >>> step into the area of using DWARF to perform static analysis
>> of RTEMS
>> >>> executables as green.
>> >>>
>> >>> DWARF does provide declaration attributes. I need to review
>> the DWARF data and
>> >>> standard to determine if we can tell what is declared inline
>> and what has been
>> >>> inlined. I think it would be good to know.
>> >>>
>> >>>> Did you try compiling with size optimization? I would expect
>> that the
>> >>>> compiler would inline far less functions and maybe even
>> ignore some
>> >>>> "inline" keywords. As far as I know it's more of a hint to
>> the compiler.
>> >>>
>> >>> Not yet. A complete tool build with those options is a lot of
>> effort and I am
>> >>> still not comfortable the report is accurate. I think this is
>> something that
>> >>> should be done at some point. I think it would create an
>> interesting data point.
>> >>>
>> >>>> I would only worry about functions that are still inlined if size
>> >>>> optimization is selected.
>> >>>
>> >>> I think we need to review the functions we currently have
>> tagged as inline. I
>> >>> think the only way we can do this is with real data.
>> >>>
>> >>>> That's the case when I tell the compiler to
>> >>>> make the program as small as possible. In all other cases I
>> want some
>> >>>> well balanced optimum between speed and size. Inlining small
>> functions
>> >>>> is OK in that case if you ask me.
>> >>>
>> >>> How do you define this, ie what is the inline policy we use?
>> >>> How do you audit this?
>> >
>> > Both questions are not simple to answer. It is most likely a case by
>> > case decision. I think there are roughly two reasons for inlining:
>> >
>> > - The code is short enough that it is smaller if it is inlined
>> compared
>> > to a function call.
>> >
>> > - There is some performance reason.
>> >
>> > Anything else?
>>
>> Not for inlining rather for not inlining? I had a discussion with
>> Joel last week
>> about coverage and he said he had previously reviewed the inlines
>> to reduce the
>> branch counts because it complicates coverage. I would prefer he
>> talks about
>> this, it is his area.
>>
>>
>> Thanks for the lead-in Chris.
>>
>> First, there are multiple types of coverage. Statement, Decision coverage,
>> MC/DC, and object coverage. Based on various limiting factors which I
>> am happy to discuss in more detail if someone is interested, we went
>> with object coverage. I will briefly remind everyone of the limitations
>> and guidelines we went into this with:
>>
>> + We didn't want instrumented code. Test as you fly, fly as you test.
>> Plus there would need to be a way to get information generated by
>> the instrumentation off the target.
>>
>> + Without gcc support for identifying sub-expressions in complex
>> logical expressions, there is no way to do DC or MCDC. Further,
>> we would have had to have a tool to interpret the non-existent
>> information and know which entries in the truth tables for every
>> expression had to be exercised for DC and MCDC.
>>
>> + gcov's reports are primarily statement coverage. And you can't get
>> gcov reports without gcov output from runs which we still are working
>> on. Also in terms of DO-178, statement coverage is generally Level C
>> type coverage.
>>
>> We wanted higher level coverage for RTEMS and thus object
>> coverage with emphasis on every generated branch statement being
>> needed and exercisable as taken and not taken. This approach
>> ensured we didn't have dead code.
>>
>> Plus this type of information could easily be obtained from simulator
>> traces from Qemu and tsim. covoar was written to combine and
>> correlate the information back to assembly and source and generate
>> the reports.
>>
>> That's what led to the current coverage approach. Chris has been
>> working to eliminate the use of command line tools (e.g. nm, objdump,
>> etc) in favor of using DWARF directly.
>>
>> That's the background of the approach and how we got reports.
>>
>> Also remember that RTEMS testing is primarily via the public APIs
>> which combined with coverage ensures we actually need all the
>> internal code. If you can't reach it from a public API, it is dead.
>>
>> But the insights into RTEMS coding practices began when we started
>> trying to improve the coverage percentages. Key point:
>>
>> IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO
>> BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION.
>>
>> At the source level, sometimes we factored out a common helper method
>> to avoid testing the same logic via multiple APIs. This reduced the number
>> of test cases needed.
>>
>> We looked at a lot of generated assembly. Sometimes we would see
>> large methods being inlined multiple times. This would increase the
>> overall
>> size of an RTEMS application. But size is not the only impact of inlining.
>> If an inlined method has one or more if's in it, then the branch paths
>> it includes are introduced EVERYWHERE it is inlined. When we
>> had _Thread_Dispatch_enable, I recall it was used > 100 times and
>> includes a branch in it. There was a build option to not inline this
>> routine to avoid needing to add over 100 test cases.
>>
>> There is a delicate code size and complexity dance when deciding
>> whether to inline something or not. There isn't an easy or obvous
>> answer.
>>
>> + How large is the method?
>>
>> + How many times is it used?
>>
>> + How many branches/loops does it have (e.g. cyclomatic complexity)?
>>
>> In general, very small methods with no branches can be inlined in less
>> code than the subroutine call takes. So these tend to be a win.
>>
>> Over a certain size, the number of times used multiplied by the inlined
>> size can have a negative impact on code size. We saw this a lot when
>> doing the code coverage improvements.
>>
>> If the inlined method has branches, is it possible or cost effective
>> to write test cases for the branches introduced at EVERY use?
>>
>> I came away believing a few things:
>>
>> + The MISRA rule for using simple, single condition branch logic
>> is really good for code coverage analysis. It simplifies it. Statement
>> coverage and branch coverage become closer to equivalent. The
>> resulting assembly code is also much easier to match to source.
>>
>> + Inlining a method with conditional logic has to be looked at
>> very carefully. You don't want to inline something and find yourself
>> having exploded the required number of test cases. Again, look
>> at the number of times a method is used but be cautious about
>> conditional expressions in inline methods.
>>
>> + Inlining doesn't always improve performance measurably. It
>> can increase code size, code to branch around, etc.
>>
>> For a current example of something that is very likely a problem
>> from a object code test coverage and code size increase view,
>> consider _Thread_Dispatch_enable. It is inlined 41 times.
>> Chris and I looked at a place where it was inlined. It w
>> <https://maps.google.com/?q=ed+at+a+place+where+it+was+inlined.+It+w&entry=gmail&source=g>as
>> about
>> 80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would
>> save ~3K across the entire RTEMS API object code to not inline it
>> at all.
>>
>> Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable().
>> From Wikipedia:
>>
>> Cyclomatic complexity is a software metric, used to indicate the
>> complexity
>> of a program. It is a quantitative measure of the number of linearly
>> independent
>> paths through a program's source code.
>>
>> I used http://www.lizard.ws/ [1] to analyse this method. If you take the
>> conditional
>> compilation out of the code, the cyclomatic complexity is 4. This means
>> that
>> every place this is inlined, it introduces 3 extra paths to test. 41*3
>> ==> 123.
>>
>> My recommendation would be to evaluate the impact of not inlining this
>> method at all. The negative impact on testing and code size are pretty
>> high.
>>
>> Unfortunately, what I wrote up is just the guidelines. Things change over
>> time. An inlined method that is a nice encapsulation when written could
>> turn out to be too large to be inlined if 10 other use cases get added.
>>
>> I lean to a method can be inlined if it is small and no more than one
>> branch statement in assembly. You can be more lenient if used only
>> once or twice. But lean to NO branches on an inlined method if the
>> number of uses is high.
>>
>> Ideally the inlined code is no larger than the code for the call itself
>> and there are no branches. But going above that is not a reason to
>> not inline.
>>
>> I know this was long but there are a lot of trade-offs. We want RTEMS to
>> be as small as possible and as testable as possible. We don't have an
>> army of people to write test cases. And I don't want to need that.
>>
>> Also I have focused on the things that the RTEMS community has control
>> over. Chris' tool gives us a broader view and we can likely help improve
>> things more broadly.
>>
>> [1] You can cut and paste code into this via the web. So pretty easy to
>> look at a single method.
>>
>
> I think all this information should be moved to a proper RTEMS
> contributors/engineering guide.
+1
https://devel.rtems.org/wiki/TBR/UserManual/RTEMS_Coverage_Analysis probably
needs scrubbing and merged into a Rest document.
It has some good stuff but some of the procedural stuff is out of date.
--joel
>
>
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail : sebastian.huber at embedded-brains.de
> PGP : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20180807/876bb21c/attachment-0002.html>
More information about the devel
mailing list