Inlined code

Joel Sherrill joel at rtems.org
Mon Aug 6 19:14:12 UTC 2018


On Mon, Aug 6, 2018 at 1:21 AM, Chris Johns <chrisj at rtems.org> wrote:

> On 06/08/2018 16:12, Christian Mauderer wrote:
> > Am 06.08.2018 um 07:31 schrieb Chris Johns:
> >> On 06/08/2018 10:51, Chris Johns wrote:
> >>> On 05/08/2018 19:39, Christian Mauderer wrote:
> >>>> Am 05.08.2018 um 04:00 schrieb Chris Johns:
> >>>>> Hi,
> >>>>>
> >>>>> I have been working on migrating covoar in the rtems-tools repo to
> DWARF. The
> >>>>> goal is remove objdump parsing and to get accurate details about the
> functions
> >>>>> being covered. This is an unfunded task.
> >>>>>
> >>>>> The work has resulted in a close examination of inlined code in
> RTEMS and what I
> >>>>> saw alarmed me so I have added a report to the rtems-exeinfo tool in
> rtems-tools
> >>>>> (the change is to be posted for review once I get the coverage tests
> running).
> >>>>>
> >>>>> A summary report for hello.exe on RTEMS 5 for SPARC is:
> >>>>>
> >>>>> inlined funcs   : 1412
> >>>>>     total funcs : 1956
> >>>>>  % inline funcs : 72%
> >>>>>      total size : 174616
> >>>>>     inline size : 81668
> >>>>>   % inline size : 46%
> >>>>>
> >>>>> This is a small application so it could be argued that skews the
> figures. A
> >>>>> large C/C++ application built with -O2 running on RTEMS 4.11 ARM
> reports the
> >>>>> inline usage as:
> >>>>>
> >>>>> inlined funcs   : 10370
> >>>>>     total funcs : 17700
> >>>>>  % inline funcs : 58%
> >>>>>      total size : 3066240
> >>>>>     inline size : 1249514
> >>>>>   % inline size : 40%
> >>>>>
> >>>>> This does not seem right to me.
> >>>>>
> >>>>> The report is new and there could be issues in the DWARF handling
> that feeds
> >>>>> this report however I am posting this to start a discussion on the
> topic of
> >>>>> inlining.
> >>>>>
> >>>>> I attach the report for hello.exe. The `-i` option generates the
> inline report.
> >>>>>
> >>>>> The first section is a summary showing the total number of functions
> in the
> >>>>> executable that have machine code and are flagged as inline. The
> report lists
> >>>>> the percentage of functions that are inlined and the percentage of
> machine code
> >>>>> that is inlined. The values seem high to me.
> >>>>>
> >>>>> The second table lists inline functions that are repeated sorted
> from the
> >>>>> largest foot print to the smallest. The first column the total size
> of machine
> >>>>> code in the executable and the second column the number of instances.
> >>>>>
> >>>>> The third table is the list of inline functions sorted from largest
> machine code
> >>>>> footprint to smallest. The second column are flags of which there is
> one. A `E`
> >>>>> indicates the inline function is also external which means the
> compiler has
> >>>>> created an external reference to this function, ie an address-of is
> being taken.
> >>>>> The third column is the address in the executable so you can take a
> look with
> >>>>> objdump at the machine code.
> >>>>>
> >>>>> We need to ask some important question in relation to inlining. It
> is cheap to
> >>>>> add and we all feel the code we add needs to be fast and needs to be
> inlined but
> >>>>> does it really need to be inlined?
> >>>>>
> >>>>> Some pieces of code do need to be inlined and the overhead is just
> that an
> >>>>> overhead, for example in the large C/C++ application there is a low
> level
> >>>>> volatile hardware write routing with close to 300 instances and a
> code size of
> >>>>> 10K. This code needs to be inlined for performance reasons but
> should the size
> >>>>> on average be 40 bytes, I doubt it.
> >>>>>
> >>>>> Can we be more judicious with our use of the inline keyword?
> >>>>>
> >>>>> Is the performance gain we really expect or is the actual overhead
> of a call
> >>>>> frame not worth saving?
> >>>>>
> >>>>> What are the real costs of inlining a piece of code? It adds size to
> the
> >>>>> executable and depending on the code being inlined it complicates
> coverage
> >>>>> analysis be adding extra branch points.
> >>>>>
> >>>>> The metrics to determine what should be inlined is complicated and I
> do not
> >>>>> think we have a suitable policy in place. I believe it is time we to
> create one.
> >>>>>
> >>>>> The issue is not limited to our code, gcc, newlib and libstdc++ seem
> to have
> >>>>> some code that should be looked at more closely. For example
> __udivmoddi4, and
> >>>>> __sprint_r.
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>
> >>>> Hello Chris,
> >>>>
> >>>> I just took a look at one of the first function in your list:
> __sprint_r
> >>>>
> >>>> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.
> git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=c4bf2dbe31da64462ecccec97c8e90
> 1e4ffadd44;hb=HEAD#l403
> >>>>
> >>>> As far as I can see, there is no explicit inline key word for that
> >>>> function. So in that case, the compiler decided that it would be a
> good
> >>>> idea to inline that function.
> >>>
> >>> Thanks and yes. At this point in time I cannot tell what is happening
> and I am
> >>> not sure the tool is reporting accurate data, I need to investigate.
> >>
> >> I have updated the tool and report to show which inline functions are:
> >>
> >>  - inlined by compiler
> >>  - declared inline and not inlined
> >>  - declared inline and inlined
> >>
> >> I have also fixed a quick hack I had where the size was the span from
> the low PC
> >> to the high PC, this was wrong. Inlined code can be split and moved when
> >> inlining creating a discontinuous address range. The size in the report
> is now
> >> the number of machine code bytes.
> >>
> >> The report will show any functions not inlined when asked to be
> inlined. We do
> >> not have any.
> >>
> >> The 'C' flag in the inlined table shows which functions the compiler
> has inlined.
> >>
> >> Chris
> >>
> >
> > With that list it is now much clearer which functions would be relevant
> > for a potential review.
>
> Yes. FYI the C/C++ RTEMS 4.11 app now gives:
>
> inlined funcs   : 10370
>     total funcs : 17700
>  % inline funcs : 58%
>      total size : 2296354
>     inline size : 479628
>   % inline size : 20%
>
> This level of reduction is more inline with what I expected.
>
> >
> >>>
> >>>> I'm not sure whether I might just haven't seen it but is there a
> >>>> possibility to distinguish between functions that have been inlined by
> >>>> the compiler and ones that have been inlined due to the "inline"
> keyword
> >>>> without looking at every definition?
> >>>
> >>> I am not sure. The DWARF data is complex and detailed and I view this
> initial
> >>> step into the area of using DWARF to perform static analysis of RTEMS
> >>> executables as green.
> >>>
> >>> DWARF does provide declaration attributes. I need to review the DWARF
> data and
> >>> standard to determine if we can tell what is declared inline and what
> has been
> >>> inlined. I think it would be good to know.
> >>>
> >>>> Did you try compiling with size optimization? I would expect that the
> >>>> compiler would inline far less functions and maybe even ignore some
> >>>> "inline" keywords. As far as I know it's more of a hint to the
> compiler.
> >>>
> >>> Not yet. A complete tool build with those options is a lot of effort
> and I am
> >>> still not comfortable the report is accurate. I think this is
> something that
> >>> should be done at some point. I think it would create an interesting
> data point.
> >>>
> >>>> I would only worry about functions that are still inlined if size
> >>>> optimization is selected.
> >>>
> >>> I think we need to review the functions we currently have tagged as
> inline. I
> >>> think the only way we can do this is with real data.
> >>>
> >>>> That's the case when I tell the compiler to
> >>>> make the program as small as possible. In all other cases I want some
> >>>> well balanced optimum between speed and size. Inlining small functions
> >>>> is OK in that case if you ask me.
> >>>
> >>> How do you define this, ie what is the inline policy we use?
> >>> How do you audit this?
> >
> > Both questions are not simple to answer. It is most likely a case by
> > case decision. I think there are roughly two reasons for inlining:
> >
> > - The code is short enough that it is smaller if it is inlined compared
> > to a function call.
> >
> > - There is some performance reason.
> >
> > Anything else?
>
> Not for inlining rather for not inlining? I had a discussion with Joel
> last week
> about coverage and he said he had previously reviewed the inlines to
> reduce the
> branch counts because it complicates coverage. I would prefer he talks
> about
> this, it is his area.
>

Thanks for the lead-in Chris.

First, there are multiple types of coverage. Statement, Decision coverage,
MC/DC, and object coverage. Based on various limiting factors which I
am happy to discuss in more detail if someone is interested, we went
with object coverage. I will briefly remind everyone of the limitations
and guidelines we went into this with:

+ We didn't want instrumented code. Test as you fly, fly as you test.
Plus there would need to be a way to get information generated by
the instrumentation off the target.

+ Without gcc support for identifying sub-expressions in complex
logical expressions, there is no way to do DC or MCDC. Further,
we would have had to have a tool to interpret the non-existent
information and know which entries in the truth tables for every
expression had to be exercised for DC and MCDC.

+ gcov's reports are primarily statement coverage. And you can't get
gcov reports without gcov output from runs which we still are working
on. Also in terms of DO-178, statement coverage is generally Level C
type coverage.

We wanted higher level coverage for RTEMS and thus object
coverage with emphasis on every generated branch statement being
needed and exercisable as taken and not taken. This approach
ensured we didn't have dead code.

Plus this type of information could easily be obtained from simulator
traces from Qemu and tsim. covoar was written to combine and
correlate the information back to assembly and source and generate
the reports.

That's what led to the current coverage approach. Chris has been
working to eliminate the use of command line tools (e.g. nm, objdump,
etc) in favor of using DWARF directly.

That's the background of the approach and how we got reports.

Also remember that RTEMS testing is primarily via the public APIs
which combined with coverage ensures we actually need all the
internal code. If you can't reach it from a public API, it is dead.

But the insights into RTEMS coding practices began when we started
trying to improve the coverage percentages. Key point:

IF CODE IS DUPLICATED AT THE SOURCE OR OBJECT LEVEL, IT HAS TO
BE FULLY TESTED AT EACH INSTANCE OR INSTANTIATION.

At the source level, sometimes we factored out a common helper method
to avoid testing the same logic via multiple APIs. This reduced the number
of test cases needed.

We looked at a lot of generated assembly. Sometimes we would see
large methods being inlined multiple times. This would increase the overall
size of an RTEMS application. But size is not the only impact of inlining.
If an inlined method has one or more if's in it, then the branch paths
it includes are introduced EVERYWHERE it is inlined. When we
had _Thread_Dispatch_enable, I recall it was used > 100 times and
includes a branch in it. There was a build option to not inline this
routine to avoid needing to add over 100 test cases.

There is a delicate code size and complexity dance when deciding
whether to inline something or not. There isn't an easy or obvous
answer.

+ How large is the method?

+ How many times is it used?

+ How many branches/loops does it have (e.g. cyclomatic complexity)?

In general, very small methods with no branches can be inlined in less
code than the subroutine call takes. So these tend to be a win.

Over a certain size, the number of times used multiplied by the inlined
size can have a negative impact on code size. We saw this a lot when
doing the code coverage improvements.

If the inlined method has branches, is it possible or cost effective
to write test cases for the branches introduced at EVERY use?

I came away believing a few things:

+ The MISRA rule for using simple, single condition branch logic
is really good for code coverage analysis. It simplifies it. Statement
coverage and branch coverage become closer to equivalent. The
resulting assembly code is also much easier to match to source.

+ Inlining a method with conditional logic has to be looked at
very carefully. You don't want to inline something and find yourself
having exploded the required number of test cases. Again, look
at the number of times a method is used but be cautious about
conditional expressions in inline methods.

+ Inlining doesn't always improve performance measurably. It
can increase code size, code to branch around, etc.

For a current example of something that is very likely a problem
from a object code test coverage and code size increase view,
consider _Thread_Dispatch_enable.  It is inlined 41 times.
Chris and I looked at a place where it was inlined. It was about
80 bytes. 41*80 ==> 3280 bytes. Based on size along, it would
save ~3K across the entire RTEMS API object code to not inline it
at all.

Then I looked at the cyclomatic complexity of _Thread_Dispatch_enable().
>From Wikipedia:

Cyclomatic complexity is a software metric, used to indicate the complexity
of a program. It is a quantitative measure of the number of linearly
independent
paths through a program's source code.

I used http://www.lizard.ws/ [1] to analyse this method. If you take the
conditional
compilation out of the code, the cyclomatic complexity is 4. This means that
every place this is inlined, it introduces 3 extra paths to test. 41*3 ==>
123.

My recommendation would be to evaluate the impact of not inlining this
method at all. The negative impact on testing and code size are pretty high.

Unfortunately, what I wrote up is just the guidelines. Things change over
time. An inlined method that is a nice encapsulation when written could
turn out to be too large to be inlined if 10 other use cases get added.

I lean to a method can be inlined if it is small and no more than one
branch statement in assembly. You can be more lenient if used only
once or twice. But lean to NO branches on an inlined method if the
number of uses is high.

Ideally the inlined code is no larger than the code for the call itself
and there are no branches. But going above that is not a reason to
not inline.

I know this was long but there are a lot of trade-offs. We want RTEMS to
be as small as possible and as testable as possible. We don't have an
army of people to write test cases. And I don't want to need that.

Also I have focused on the things that the RTEMS community has control
over. Chris' tool gives us a broader view and we can likely help improve
things more broadly.

[1] You can cut and paste code into this via the web. So pretty easy to
look at a single method.



>
> Chris
> _______________________________________________
> devel mailing list
> devel at rtems.org
> http://lists.rtems.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20180806/9f9af06e/attachment-0002.html>


More information about the devel mailing list