Inlined code

Tue Aug 7 05:48:27 UTC 2018

On 06/08/18 21:14, Joel Sherrill wrote:
>
>
> On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chrisj at rtems.org 
> <mailto:chrisj at rtems.org>> wrote:
>
>     On 06/08/2018 16:12, Christian Mauderer wrote:
>     > Am 06.08.2018 um 07:31 schrieb Chris Johns:
>     >> On 06/08/2018 10:51, Chris Johns wrote:
>     >>> On 05/08/2018 19:39, Christian Mauderer wrote:
>     >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns:
>     >>>>> Hi,
>     >>>>>
>     >>>>> I have been working on migrating covoar in the rtems-tools
>     repo to DWARF. The
>     >>>>> goal is remove objdump parsing and to get accurate details
>     about the functions
>     >>>>> being covered. This is an unfunded task.
>     >>>>>
>     >>>>> The work has resulted in a close examination of inlined code
>     in RTEMS and what I
>     >>>>> saw alarmed me so I have added a report to the rtems-exeinfo
>     tool in rtems-tools
>     >>>>> (the change is to be posted for review once I get the
>     coverage tests running).
>     >>>>>
>     >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is:
>     >>>>>
>     >>>>> inlined funcs   : 1412
>     >>>>>     total funcs : 1956
>     >>>>>  % inline funcs : 72%
>     >>>>>      total size : 174616
>     >>>>>     inline size : 81668
>     >>>>>   % inline size : 46%
>     >>>>>
>     >>>>> This is a small application so it could be argued that skews
>     the figures. A
>     >>>>> large C/C++ application built with -O2 running on RTEMS 4.11
>     ARM reports the
>     >>>>> inline usage as:
>     >>>>>
>     >>>>> inlined funcs   : 10370
>     >>>>>     total funcs : 17700
>     >>>>>  % inline funcs : 58%
>     >>>>>      total size : 3066240
>     >>>>>     inline size : 1249514
>     >>>>>   % inline size : 40%
>     >>>>>
>     >>>>> This does not seem right to me.
>     >>>>>
>     >>>>> The report is new and there could be issues in the DWARF
>     handling that feeds
>     >>>>> this report however I am posting this to start a discussion
>     on the topic of
>     >>>>> inlining.
>     >>>>>
>     >>>>> I attach the report for hello.exe. The `-i` option generates
>     the inline report.
>     >>>>>
>     >>>>> The first section is a summary showing the total number of
>     functions in the
>     >>>>> executable that have machine code and are flagged as inline.
>     The report lists
>     >>>>> the percentage of functions that are inlined and the
>     percentage of machine code
>     >>>>> that is inlined. The values seem high to me.
>     >>>>>
>     >>>>> The second table lists inline functions that are repeated
>     sorted from the
>     >>>>> largest foot print to the smallest. The first column the
>     total size of machine
>     >>>>> code in the executable and the second column the number of
>     instances.
>     >>>>>
>     >>>>> The third table is the list of inline functions sorted from
>     largest machine code
>     >>>>> footprint to smallest. The second column are flags of which
>     there is one. A `E`
>     >>>>> indicates the inline function is also external which means
>     the compiler has
>     >>>>> created an external reference to this function, ie an
>     address-of is being taken.
>     >>>>> The third column is the address in the executable so you can
>     take a look with
>     >>>>> objdump at the machine code.
>     >>>>>
>     >>>>> We need to ask some important question in relation to
>     inlining. It is cheap to
>     >>>>> add and we all feel the code we add needs to be fast and
>     needs to be inlined but
>     >>>>> does it really need to be inlined?
>     >>>>>
>     >>>>> Some pieces of code do need to be inlined and the overhead
>     is just that an
>     >>>>> overhead, for example in the large C/C++ application there
>     is a low level
>     >>>>> volatile hardware write routing with close to 300 instances
>     and a code size of
>     >>>>> 10K. This code needs to be inlined for performance reasons
>     but should the size
>     >>>>> on average be 40 bytes, I doubt it.
>     >>>>>
>     >>>>> Can we be more judicious with our use of the inline keyword?
>     >>>>>
>     >>>>> Is the performance gain we really expect or is the actual
>     overhead of a call
>     >>>>> frame not worth saving?
>     >>>>>
>     >>>>> What are the real costs of inlining a piece of code? It adds
>     size to the
>     >>>>> executable and depending on the code being inlined it
>     complicates coverage
>     >>>>> analysis be adding extra branch points.
>     >>>>>
>     >>>>> The metrics to determine what should be inlined is
>     complicated and I do not
>     >>>>> think we have a suitable policy in place. I believe it is
>     time we to create one.
>     >>>>>
>     >>>>> The issue is not limited to our code, gcc, newlib and
>     libstdc++ seem to have
>     >>>>> some code that should be looked at more closely. For example
>     __udivmoddi4, and
>     >>>>> __sprint_r.
>     >>>>>
>     >>>>> Chris
>     >>>>>
>     >>>>>
>     >>>>
>     >>>> Hello Chris,
>     >>>>
>     >>>> I just took a look at one of the first function in your list:
>     __sprint_r
>     >>>>
>     >>>>
>     https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e901e4ffadd44;hb=HEAD#l403
>     <https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e901e4ffadd44;hb=HEAD#l403>
>     >>>>
>     >>>> As far as I can see, there is no explicit inline key word for
>     that
>     >>>> function. So in that case, the compiler decided that it would
>     be a good
>     >>>> idea to inline that function.
>     >>>
>     >>> Thanks and yes. At this point in time I cannot tell what is
>     happening and I am
>     >>> not sure the tool is reporting accurate data, I need to
>     investigate.
>     >>
>     >> I have updated the tool and report to show which inline
>     functions are:
>     >>
>     >>  - inlined by compiler
>     >>  - declared inline and not inlined
>     >>  - declared inline and inlined
>     >>
>     >> I have also fixed a quick hack I had where the size was the
>     span from the low PC
>     >> to the high PC, this was wrong. Inlined code can be split and
>     moved when
>     >> inlining creating a discontinuous address range. The size in
>     the report is now
>     >> the number of machine code bytes.
>     >>
>     >> The report will show any functions not inlined when asked to be
>     inlined. We do
>     >> not have any.
>     >>
>     >> The 'C' flag in the inlined table shows which functions the
>     compiler has inlined.
>     >>
>     >> Chris
>     >>
>     >
>     > With that list it is now much clearer which functions would be
>     relevant
>     > for a potential review.
>
>     Yes. FYI the C/C++ RTEMS 4.11 app now gives:
>
>     inlined funcs   : 10370
>         total funcs : 17700
>      % inline funcs : 58%
>          total size : 2296354
>         inline size : 479628
>       % inline size : 20%
>
>     This level of reduction is more inline with what I expected.
>
>     >
>     >>>
>     >>>> I'm not sure whether I might just haven't seen it but is there a
>     >>>> possibility to distinguish between functions that have been
>     inlined by
>     >>>> the compiler and ones that have been inlined due to the
>     "inline" keyword
>     >>>> without looking at every definition?
>     >>>
>     >>> I am not sure. The DWARF data is complex and detailed and I
>     view this initial
>     >>> step into the area of using DWARF to perform static analysis
>     of RTEMS
>     >>> executables as green.
>     >>>
>     >>> DWARF does provide declaration attributes. I need to review
>     the DWARF data and
>     >>> standard to determine if we can tell what is declared inline
>     and what has been
>     >>> inlined. I think it would be good to know.
>     >>>
>     >>>> Did you try compiling with size optimization? I would expect
>     that the
>     >>>> compiler would inline far less functions and maybe even
>     ignore some
>     >>>> "inline" keywords. As far as I know it's more of a hint to
>     the compiler.
>     >>>
>     >>> Not yet. A complete tool build with those options is a lot of
>     effort and I am
>     >>> still not comfortable the report is accurate. I think this is
>     something that
>     >>> should be done at some point. I think it would create an
>     interesting data point.
>     >>>
>     >>>> I would only worry about functions that are still inlined if size
>     >>>> optimization is selected.
>     >>>
>     >>> I think we need to review the functions we currently have
>     tagged as inline. I
>     >>> think the only way we can do this is with real data.
>     >>>
>     >>>> That's the case when I tell the compiler to
>     >>>> make the program as small as possible. In all other cases I
>     want some
>     >>>> well balanced optimum between speed and size. Inlining small
>     functions
>     >>>> is OK in that case if you ask me.
>     >>>
>     >>> How do you define this, ie what is the inline policy we use?
>     >>> How do you audit this?
>     >
>     > Both questions are not simple to answer. It is most likely a case by
>     > case decision. I think there are roughly two reasons for inlining:
>     >
>     > - The code is short enough that it is smaller if it is inlined
>     compared
>     > to a function call.
>     >
>     > - There is some performance reason.
>     >
>     > Anything else?
>
>     Not for inlining rather for not inlining? I had a discussion with
>     Joel last week
>     about coverage and he said he had previously reviewed the inlines
>     to reduce the
>     branch counts because it complicates coverage. I would prefer he
>     talks about
>     this, it is his area.
>
>
> Thanks for the lead-in Chris.
>
> First, there are multiple types of coverage. Statement, Decision coverage,
> MC/DC, and object coverage. Based on various limiting factors which I
> am happy to discuss in more detail if someone is interested, we went
> with object coverage. I will briefly remind everyone of the limitations
> and guidelines we went into this with:
>
> + We didn't want instrumented code. Test as you fly, fly as you test.
> Plus there would need to be a way to get information generated by
> the instrumentation off the target.
>
> + Without gcc support for identifying sub-expressions in complex
> logical expressions, there is no way to do DC or MCDC. Further,
> we would have had to have a tool to interpret the non-existent
> information and know which entries in the truth tables for every
> expression had to be exercised for DC and MCDC.
>
> + gcov's reports are primarily statement coverage. And you can't get
> gcov reports without gcov output from runs which we still are working
> on. Also in terms of DO-178, statement coverage is generally Level C
> type coverage.
>
> We wanted higher level coverage for RTEMS and thus object
> coverage with emphasis on every generated branch statement being
> needed and exercisable as taken and not taken. This approach
> ensured we didn't have dead code.
>
> Plus this type of information could easily be obtained from simulator
> traces from Qemu and tsim. covoar was written to combine and
> correlate the information back to assembly and source and generate
> the reports.
>
> That's what led to the current coverage approach. Chris has been
> working to eliminate the use of command line tools (e.g. nm, objdump,
> etc) in favor of using DWARF directly.
>
> That's the background of the approach and how we got reports.
>
> Also remember that RTEMS testing is primarily via the public APIs
> which combined with coverage ensures we actually need all the
> internal code. If you can't reach it from a public API, it is dead.
>
> But the insights into RTEMS coding practices began when we started
> trying to improve the coverage percentages. Key point:
>
> IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO
> BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION.
>
> At the source level, sometimes we factored out a common helper method
> to avoid testing the same logic via multiple APIs. This reduced the number
> of test cases needed.
>
> We looked at a lot of generated assembly. Sometimes we would see
> large methods being inlined multiple times. This would increase the 
> overall
> size of an RTEMS application. But size is not the only impact of inlining.
> If an inlined method has one or more if's in it, then the branch paths
> it includes are introduced EVERYWHERE it is inlined. When we
> had _Thread_Dispatch_enable, I recall it was used > 100 times and
> includes a branch in it. There was a build option to not inline this
> routine to avoid needing to add over 100 test cases.
>
> There is a delicate code size and complexity dance when deciding
> whether to inline something or not. There isn't an easy or obvous
> answer.
>
> + How large is the method?
>
> + How many times is it used?
>
> + How many branches/loops does it have (e.g. cyclomatic complexity)?
>
> In general, very small methods with no branches can be inlined in less
> code than the subroutine call takes. So these tend to be a win.
>
> Over a certain size, the number of times used multiplied by the inlined
> size can have a negative impact on code size. We saw this a lot when
> doing the code coverage improvements.
>
> If the inlined method has branches, is it possible or cost effective
> to write test cases for the branches introduced at EVERY use?
>
> I came away believing a few things:
>
> + The MISRA rule for using simple, single condition branch logic
> is really good for code coverage analysis. It simplifies it. Statement
> coverage and branch coverage become closer to equivalent. The
> resulting assembly code is also much easier to match to source.
>
> + Inlining a method with conditional logic has to be looked at
> very carefully. You don't want to inline something and find yourself
> having exploded the required number of test cases. Again, look
> at the number of times a method is used but be cautious about
> conditional expressions in inline methods.
>
> + Inlining doesn't always improve performance measurably. It
> can increase code size, code to branch around, etc.
>
> For a current example of something that is very likely a problem
> from a object code test coverage and code size increase view,
> consider _Thread_Dispatch_enable.  It is inlined 41 times.
> Chris and I looked at a place where it was inlined. It was about
> 80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would
> save ~3K across the entire RTEMS API object code to not inline it
> at all.
>
> Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable().
> From Wikipedia:
>
> Cyclomatic complexity is a software metric, used to indicate the 
> complexity
> of a program. It is a quantitative measure of the number of linearly 
> independent
> paths through a program's source code.
>
> I used http://www.lizard.ws/ [1] to analyse this method. If you take 
> the conditional
> compilation out of the code, the cyclomatic complexity is 4. This 
> means that
> every place this is inlined, it introduces 3 extra paths to test. 41*3 
> ==> 123.
>
> My recommendation would be to evaluate the impact of not inlining this
> method at all. The negative impact on testing and code size are pretty 
> high.
>
> Unfortunately, what I wrote up is just the guidelines. Things change over
> time. An inlined method that is a nice encapsulation when written could
> turn out to be too large to be inlined if 10 other use cases get added.
>
> I lean to a method can be inlined if it is small and no more than one
> branch statement in assembly. You can be more lenient if used only
> once or twice. But lean to NO branches on an inlined method if the
> number of uses is high.
>
> Ideally the inlined code is no larger than the code for the call itself
> and there are no branches. But going above that is not a reason to
> not inline.
>
> I know this was long but there are a lot of trade-offs. We want RTEMS to
> be as small as possible and as testable as possible. We don't have an
> army of people to write test cases. And I don't want to need that.
>
> Also I have focused on the things that the RTEMS community has control
> over. Chris' tool gives us a broader view and we can likely help improve
> things more broadly.
>
> [1] You can cut and paste code into this via the web. So pretty easy to
> look at a single method.

I think all this information should be moved to a proper RTEMS 
contributors/engineering guide.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.