[PATCH v2] covoar: Store address-to-line info outside of DWARF

Joel Sherrill joel at rtems.org
Mon Jun 7 15:44:04 UTC 2021


On Mon, Jun 7, 2021 at 7:00 AM Alex White <alex.white at oarcorp.com> wrote:

>
> On Wed, Jun 2, 2021 at 7:37 PM Chris Johns <chrisj at rtems.org> wrote:
> >
> > Looking good with a single comment below ...
> >
> > On 3/6/21 6:08 am, Alex White wrote:
> > > This adds the AddressToLineMapper class and supporting classes to
> > > assume responsibility of tracking address-to-line information.
> > >
> > > This allows the DWARF library to properly cleanup all of its resources
> > > and leads to significant memory savings.
> > >
> > > Closes #4383
> > > ---
> > >  rtemstoolkit/rld-dwarf.cpp           |   5 +
> > >  rtemstoolkit/rld-dwarf.h             |   5 +
> > >  tester/covoar/AddressToLineMapper.cc | 104 +++++++++++++
> > >  tester/covoar/AddressToLineMapper.h  | 211 +++++++++++++++++++++++++++
> > >  tester/covoar/ExecutableInfo.cc      |  73 +++++----
> > >  tester/covoar/ExecutableInfo.h       |  11 +-
> > >  tester/covoar/wscript                |   1 +
> > >  7 files changed, 368 insertions(+), 42 deletions(-)
> > >  create mode 100644 tester/covoar/AddressToLineMapper.cc
> > >  create mode 100644 tester/covoar/AddressToLineMapper.h
> > >
> > > diff --git a/rtemstoolkit/rld-dwarf.cpp b/rtemstoolkit/rld-dwarf.cpp
> > > index 2fce0e4..1eae50c 100644
> > > --- a/rtemstoolkit/rld-dwarf.cpp
> > > +++ b/rtemstoolkit/rld-dwarf.cpp
> > > @@ -1893,6 +1893,11 @@ namespace rld
> > >        return false;
> > >      }
> > >
> > > +    const addresses& compilation_unit::get_addresses () const
> > > +    {
> > > +      return addr_lines_;
> > > +    }
> > > +
> > >      functions&
> > >      compilation_unit::get_functions ()
> > >      {
> > > diff --git a/rtemstoolkit/rld-dwarf.h b/rtemstoolkit/rld-dwarf.h
> > > index 1210813..eadb50d 100644
> > > --- a/rtemstoolkit/rld-dwarf.h
> > > +++ b/rtemstoolkit/rld-dwarf.h
> > > @@ -707,6 +707,11 @@ namespace rld
> > >         */
> > >        unsigned int pc_high () const;
> > >
> > > +      /**
> > > +       * The addresses associated with this compilation unit.
> > > +       */
> > > +      const addresses& get_addresses () const;
> > > +
> > >        /**
> > >         * Get the source and line for an address. If the address does
> not match
> > >         * false is returned the file is set to 'unknown' and the line
> is set to
> > > diff --git a/tester/covoar/AddressToLineMapper.cc
> b/tester/covoar/AddressToLineMapper.cc
> > > new file mode 100644
> > > index 0000000..c305e3b
> > > --- /dev/null
> > > +++ b/tester/covoar/AddressToLineMapper.cc
> > > @@ -0,0 +1,104 @@
> > > +/*! @file AddressToLineMapper.cc
> > > + *  @brief AddressToLineMapper Implementation
> > > + *
> > > + *  This file contains the implementation of the functionality
> > > + *  of the AddressToLineMapper class.
> > > + */
> > > +
> > > +#include "AddressToLineMapper.h"
> > > +
> > > +namespace Coverage {
> > > +
> > > +  uint64_t SourceLine::location() const
> > > +  {
> > > +    return address;
> > > +  }
> > > +
> > > +  bool SourceLine::is_an_end_sequence() const
> > > +  {
> > > +    return is_end_sequence;
> > > +  }
> > > +
> > > +  const std::string& SourceLine::path() const
> > > +  {
> > > +    return path_;
> > > +  }
> > > +
> > > +  int SourceLine::line() const
> > > +  {
> > > +    return line_num;
> > > +  }
> > > +
> > > +  void AddressLineRange::addSourceLine(const rld::dwarf::address&
> address)
> > > +  {
> > > +    auto insertResult = sourcePaths.insert(address.path());
> > > +
> > > +    sourceLines.emplace_back(
> > > +      SourceLine (
> > > +        address.location(),
> > > +        *insertResult.first,
> > > +        address.line(),
> > > +        address.is_an_end_sequence()
> > > +      )
> > > +    );
> > > +  }
> > > +
> > > +  const SourceLine& AddressLineRange::getSourceLine(uint32_t address)
> const
> > > +  {
> > > +    if (address < lowAddress || address > highAddress) {
> > > +      throw SourceNotFoundError(std::to_string(address));
> > > +    }
> > > +
> > > +    const SourceLine* last_line = nullptr;
> > > +    for (const auto &line : sourceLines) {
> > > +      if (address <= line.location())
> > > +      {
> > > +        if (address == line.location())
> > > +          last_line = &line;
> > > +        break;
> > > +      }
> > > +      last_line = &line;
> > > +    }
> > > +
> > > +    if (last_line == nullptr) {
> > > +      throw SourceNotFoundError(std::to_string(address));
> > > +    }
> > > +
> > > +    return *last_line;
> > > +  }
> >
> > How often is this function being called? If it is a hot spot are there
> other
> > ways to search for the address? I suppose it depends on the number of
> lines in
> > the vector.
>
> According to a quick gprof run, this function is being called a decent
> amount, somewhere around 1.5 million times for a full ARM coverage run.
>
> But it only takes up a very small amount (less than 0.02%) of cumulative
> execution time so I don't believe it is an issue.
>

Glad we have profiling data to guide optimizations. I also suspect that as
the code is c-plus-plus-ified more, some performance improvements will
occur naturally. I know you (Alex) have mentioned some performance
improvements just from this alone. But this discussion reminded me of
an XKCD and remarks by Dijkstra which are always good to remember.

https://m.xkcd.com/1691/

Dijkstra is famously remembered for a number of variants on the premature
optimization theme:

"Programmers waste enormous amounts of time thinking about, or worrying
about, the speed of noncritical parts of their programs, and these attempts
at efficiency actually have a strong negative impact when debugging and
maintenance are considered. We should forget about small efficiencies, say
about 97% of the time: premature optimization is the root of all evil. Yet
we should not pass up our opportunities in that critical 3%."

I'd add that I want RTEMS to stay algorithmically optimized instead of
tricks. "Optimize" for memory usage, readability, testability,
maintainability,
etc. If we had focused on tricks 30+ years ago, all of those tricks would
likely be  inappropriate and burdens now which only served to negatively
impacts.

<old man stops yelling at clouds now>

--joel

>
> Alex
>
> >
> > If the sourcesLines vector was sorted can this loop be replaced with
> some form
> > of std::upper_bound? That is ...
> >
> >  auto last_line =
> >     std::upper_bound(sourceLines.begin(),
> >                      sourceLines.end(),
> >                      address,
> >                      [&address](const SourceLine& sl) {
> >                        return address < sl.location();
> >                      });
> >
> > [ not sure if the code I posted would work and if I got the lamda right
> :) ]
> >
> > Just wondering.
> >
> > Chris
> >
> > > +
> > > +  void AddressToLineMapper::getSource(
> > > +    uint32_t address,
> > > +    std::string& sourceFile,
> > > +    int& sourceLine
> > > +  ) const {
> > > +    const SourceLine default_sourceline = SourceLine();
> > > +    const SourceLine* match = &default_sourceline;
> > > +
> > > +    for (const auto &range : addressLineRanges) {
> > > +      try {
> > > +        const SourceLine& potential_match =
> range.getSourceLine(address);
> > > +
> > > +        if (match->is_an_end_sequence() ||
> !potential_match.is_an_end_sequence()) {
> > > +          match = &potential_match;
> > > +        }
> > > +      } catch (const AddressLineRange::SourceNotFoundError&) {}
> > > +    }
> > > +
> > > +    sourceFile = match->path();
> > > +    sourceLine = match->line();
> > > +  }
> > > +
> > > +  AddressLineRange& AddressToLineMapper::makeRange(
> > > +    uint32_t low,
> > > +    uint32_t high
> > > +  )
> > > +  {
> > > +    addressLineRanges.emplace_back(
> > > +      AddressLineRange(low, high)
> > > +    );
> > > +
> > > +    return addressLineRanges.back();
> > > +  }
> > > +
> > > +}
> > > diff --git a/tester/covoar/AddressToLineMapper.h
> b/tester/covoar/AddressToLineMapper.h
> > > new file mode 100644
> > > index 0000000..e6ab4a8
> > > --- /dev/null
> > > +++ b/tester/covoar/AddressToLineMapper.h
> > > @@ -0,0 +1,211 @@
> > > +/*! @file AddressToLineMapper.h
> > > + *  @brief AddressToLineMapper Specification
> > > + *
> > > + *  This file contains the specification of the AddressToLineMapper
> class.
> > > + */
> > > +
> > > +#ifndef __ADDRESS_TO_LINE_MAPPER_H__
> > > +#define __ADDRESS_TO_LINE_MAPPER_H__
> > > +
> > > +#include <cstdint>
> > > +#include <set>
> > > +#include <string>
> > > +#include <vector>
> > > +
> > > +#include <rld-dwarf.h>
> > > +
> > > +namespace Coverage {
> > > +
> > > +  /*! @class SourceLine
> > > +   *
> > > +   *  This class stores source information for a specific address.
> > > +   */
> > > +  class SourceLine {
> > > +
> > > +  public:
> > > +
> > > +    SourceLine()
> > > +    : address(0),
> > > +      path_("unknown"),
> > > +      line_num(-1),
> > > +      is_end_sequence(true)
> > > +    {
> > > +    }
> > > +
> > > +    SourceLine(
> > > +      uint64_t addr,
> > > +      const std::string& src,
> > > +      int line,
> > > +      bool end_sequence
> > > +    ) : address(addr),
> > > +        path_(src),
> > > +        line_num(line),
> > > +        is_end_sequence(end_sequence)
> > > +    {
> > > +    }
> > > +
> > > +    /*!
> > > +     *  This method gets the address of this source information.
> > > +     *
> > > +     *  @return Returns the address of this source information
> > > +     */
> > > +    uint64_t location() const;
> > > +
> > > +    /*!
> > > +     *  This method gets a value indicating whether this address
> represents an
> > > +     *  end sequence.
> > > +     *
> > > +     *  @return Returns whether this address represents an end
> sequence or not
> > > +     */
> > > +    bool is_an_end_sequence() const;
> > > +
> > > +    /*!
> > > +     *  This method gets the source file path of this address.
> > > +     *
> > > +     *  @return Returns the source file path of this address
> > > +     */
> > > +    const std::string& path() const;
> > > +
> > > +    /*!
> > > +     *  This method gets the source line number of this address.
> > > +     *
> > > +     *  @return Returns the source line number of this address
> > > +     */
> > > +    int line() const;
> > > +
> > > +  private:
> > > +
> > > +    /*!
> > > +     *  The address of this source information.
> > > +     */
> > > +    uint64_t address;
> > > +
> > > +    /*!
> > > +     *  An iterator pointing to the location in the set that contains
> the
> > > +     *  source file path of the address.
> > > +     */
> > > +    const std::string& path_;
> > > +
> > > +    /*!
> > > +     *  The source line number of the address.
> > > +     */
> > > +    int line_num;
> > > +
> > > +    /*!
> > > +     *  Whether the address represents an end sequence or not.
> > > +     */
> > > +    bool is_end_sequence;
> > > +
> > > +  };
> > > +
> > > +  typedef std::vector<SourceLine> SourceLines;
> > > +
> > > +  typedef std::set<std::string> SourcePaths;
> > > +
> > > +  /*! @class AddressLineRange
> > > +   *
> > > +   *  This class stores source information for an address range.
> > > +   */
> > > +  class AddressLineRange {
> > > +
> > > +  public:
> > > +
> > > +    class SourceNotFoundError : public std::runtime_error {
> > > +      /* Use the base class constructors. */
> > > +      using std::runtime_error::runtime_error;
> > > +    };
> > > +
> > > +    AddressLineRange(
> > > +      uint32_t low,
> > > +      uint32_t high
> > > +    ) : lowAddress(low), highAddress(high)
> > > +    {
> > > +    }
> > > +
> > > +    /*!
> > > +     *  This method adds source and line information for a specified
> address.
> > > +     *
> > > +     *  @param[in] address specifies the DWARF address information
> > > +     */
> > > +    void addSourceLine(const rld::dwarf::address& address);
> > > +
> > > +    /*!
> > > +     *  This method gets the source file name and line number for a
> given
> > > +     *  address.
> > > +     *
> > > +     *  @param[in] address specifies the address to look up
> > > +     *
> > > +     *  @return Returns the source information for the specified
> address.
> > > +     */
> > > +    const SourceLine& getSourceLine(uint32_t address) const;
> > > +
> > > +  private:
> > > +
> > > +    /*!
> > > +     *  The low address for this range.
> > > +     */
> > > +    uint32_t lowAddress;
> > > +
> > > +    /*!
> > > +     *  The high address for this range.
> > > +     */
> > > +    uint32_t highAddress;
> > > +
> > > +    /*!
> > > +     *  The source information for addresses in this range.
> > > +     */
> > > +    SourceLines sourceLines;
> > > +
> > > +    /*!
> > > +     *  The set of source file names for this range.
> > > +     */
> > > +    SourcePaths sourcePaths;
> > > +
> > > +  };
> > > +
> > > +  typedef std::vector<AddressLineRange> AddressLineRanges;
> > > +
> > > +  /*! @class AddressToLineMapper
> > > +   *
> > > +   *  This class provides address-to-line capabilities.
> > > +   */
> > > +  class AddressToLineMapper {
> > > +
> > > +  public:
> > > +
> > > +    /*!
> > > +     *  This method gets the source file name and line number for a
> given
> > > +     *  address.
> > > +     *
> > > +     *  @param[in] address specifies the address to look up
> > > +     *  @param[out] sourceFile specifies the name of the source file
> > > +     *  @param[out] sourceLine specifies the line number in the
> source file
> > > +     */
> > > +    void getSource(
> > > +      uint32_t address,
> > > +      std::string& sourceFile,
> > > +      int& sourceLine
> > > +    ) const;
> > > +
> > > +    /*!
> > > +     *  This method creates a new range with the specified addresses.
> > > +     *
> > > +     *  @param[in] low specifies the low address of the range
> > > +     *  @param[in] high specifies the high address of the range
> > > +     *
> > > +     *  @return Returns a reference to the newly created range
> > > +     */
> > > +    AddressLineRange& makeRange(uint32_t low, uint32_t high);
> > > +
> > > +  private:
> > > +
> > > +    /*!
> > > +     *  The address and line information ranges.
> > > +     */
> > > +    AddressLineRanges addressLineRanges;
> > > +
> > > +  };
> > > +
> > > +}
> > > +
> > > +#endif
> > > diff --git a/tester/covoar/ExecutableInfo.cc
> b/tester/covoar/ExecutableInfo.cc
> > > index 861e60d..a6184e7 100644
> > > --- a/tester/covoar/ExecutableInfo.cc
> > > +++ b/tester/covoar/ExecutableInfo.cc
> > > @@ -40,61 +40,60 @@ namespace Coverage {
> > >      executable.begin();
> > >      executable.load_symbols(symbols);
> > >
> > > +    rld::dwarf::file debug;
> > > +
> > >      debug.begin(executable.elf());
> > >      debug.load_debug();
> > >      debug.load_functions();
> > >
> > > -    try {
> > > -      for (auto& cu : debug.get_cus()) {
> > > -        for (auto& func : cu.get_functions()) {
> > > -          if (!func.has_machine_code()) {
> > > -            continue;
> > > -          }
> > > +    for (auto& cu : debug.get_cus()) {
> > > +      AddressLineRange& range = mapper.makeRange(cu.pc_low(),
> cu.pc_high());
> > > +      // Does not filter on desired symbols under the assumption that
> the test
> > > +      // code and any support code is small relative to what is being
> tested.
> > > +      for (const auto &address : cu.get_addresses()) {
> > > +        range.addSourceLine(address);
> > > +      }
> > >
> > > -          if (!SymbolsToAnalyze->isDesired(func.name())) {
> > > -            continue;
> > > +      for (auto& func : cu.get_functions()) {
> > > +        if (!func.has_machine_code()) {
> > > +          continue;
> > > +        }
> > > +
> > > +        if (!SymbolsToAnalyze->isDesired(func.name())) {
> > > +          continue;
> > > +        }
> > > +
> > > +        if (func.is_inlined()) {
> > > +          if (func.is_external()) {
> > > +            // Flag it
> > > +            std::cerr << "Function is both external and inlined: "
> > > +                      << func.name() << std::endl;
> > >            }
> > >
> > > -          if (func.is_inlined()) {
> > > -            if (func.is_external()) {
> > > -              // Flag it
> > > -              std::cerr << "Function is both external and inlined: "
> > > -                        << func.name() << std::endl;
> > > -            }
> > > -
> > > -            if (func.has_entry_pc()) {
> > > -              continue;
> > > -            }
> > > -
> > > -            // If the low PC address is zero, the symbol does not
> appear in
> > > -            // this executable.
> > > -            if (func.pc_low() == 0) {
> > > -              continue;
> > > -            }
> > > +          if (func.has_entry_pc()) {
> > > +            continue;
> > >            }
> > >
> > > -          // We can't process a zero size function.
> > > -          if (func.pc_high() == 0) {
> > > +          // If the low PC address is zero, the symbol does not
> appear in
> > > +          // this executable.
> > > +          if (func.pc_low() == 0) {
> > >              continue;
> > >            }
> > > +        }
> > >
> > > -          createCoverageMap (cu.name(), func.name(),
> > > -                              func.pc_low(), func.pc_high() - 1);
> > > +        // We can't process a zero size function.
> > > +        if (func.pc_high() == 0) {
> > > +          continue;
> > >          }
> > > +
> > > +        createCoverageMap (cu.name(), func.name(),
> > > +                            func.pc_low(), func.pc_high() - 1);
> > >        }
> > > -    } catch (...) {
> > > -      debug.end();
> > > -      throw;
> > >      }
> > > -
> > > -    // Can't cleanup handles until the destructor because the
> information is
> > > -    // referenced elsewhere. NOTE: This could cause problems from too
> many open
> > > -    // file descriptors.
> > >    }
> > >
> > >    ExecutableInfo::~ExecutableInfo()
> > >    {
> > > -    debug.end();
> > >    }
> > >
> > >    void ExecutableInfo::dumpCoverageMaps( void ) {
> > > @@ -197,7 +196,7 @@ namespace Coverage {
> > >    {
> > >      std::string file;
> > >      int         lno;
> > > -    debug.get_source (address, file, lno);
> > > +    mapper.getSource (address, file, lno);
> > >      std::ostringstream ss;
> > >      ss << file << ':' << lno;
> > >      line = ss.str ();
> > > diff --git a/tester/covoar/ExecutableInfo.h
> b/tester/covoar/ExecutableInfo.h
> > > index 5d5a595..dfc71aa 100644
> > > --- a/tester/covoar/ExecutableInfo.h
> > > +++ b/tester/covoar/ExecutableInfo.h
> > > @@ -15,6 +15,7 @@
> > >  #include <rld-files.h>
> > >  #include <rld-symbols.h>
> > >
> > > +#include "AddressToLineMapper.h"
> > >  #include "CoverageMapBase.h"
> > >  #include "SymbolTable.h"
> > >
> > > @@ -157,11 +158,6 @@ namespace Coverage {
> > >        uint32_t           highAddress
> > >      );
> > >
> > > -    /*!
> > > -     *  The DWARF data to the ELF executable.
> > > -     */
> > > -    rld::dwarf::file debug;
> > > -
> > >      /*!
> > >       *  The executable's file name.
> > >       */
> > > @@ -172,6 +168,11 @@ namespace Coverage {
> > >       */
> > >      rld::symbols::table symbols;
> > >
> > > +    /*!
> > > +     *  The address-to-line mapper for this executable.
> > > +     */
> > > +    AddressToLineMapper mapper;
> > > +
> > >      /*!
> > >       *  This map associates a symbol with its coverage map.
> > >       */
> > > diff --git a/tester/covoar/wscript b/tester/covoar/wscript
> > > index 82599b0..a928b1b 100644
> > > --- a/tester/covoar/wscript
> > > +++ b/tester/covoar/wscript
> > > @@ -83,6 +83,7 @@ def build(bld):
> > >
> > >      bld.stlib(target = 'ccovoar',
> > >                source = ['app_common.cc',
> > > +                        'AddressToLineMapper.cc',
> > >                          'CoverageFactory.cc',
> > >                          'CoverageMap.cc',
> > >                          'CoverageMapBase.cc',
> > >
> > _______________________________________________
> > devel mailing list
> > devel at rtems.org
> > http://lists.rtems.org/mailman/listinfo/devel
> _______________________________________________
> devel mailing list
> devel at rtems.org
> http://lists.rtems.org/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20210607/15d0f5db/attachment-0001.html>


More information about the devel mailing list