Timeline Tool

Tue Oct 23 09:21:11 UTC 2007

> -----Original Message-----
> From: Chris Johns [mailto:chrisj at rtems.org]
> Sent: Saturday, October 20, 2007 1:18 AM
> To: Manuel Coutinho
> Cc: rtems-users at rtems.org
> Subject: Re: Timeline Tool
> 
> Manuel Coutinho wrote:
> >
> > [Manuel] I think the inclusion of the "timeline tool" inside the capture
> > engine is a good idea. But we have some requirements that might require
> some
> > modifications to the current capture engine. I would prefer building a
> > separate tool because we don't want to change the current status of the
> > capture engine, i.e., its current functionality. If you (and the RTEMS
> > community) don't have any issues with us modifying the current capture
> > engine we also don't have any problems.
> >
> 
> The need to keep a single functional part in RTEMS is important.
> 
> It is hard to judge what the issues are without detail about the changes.
> Any
> changes to the capture engine will be handled like any other changes in
> RTEMS.
> We will review the patch, comment, and either accept or reject the change.
> 
> Please note the deeper into RTEMS's score etc you go the harder the code
> is
> looked at. Following the coding standard of that code becomes important.
> 

Perhaps I should have described in more detail our tool (what it traces,
etc). These are some of the most important requirements
1. The tool must not modify the application. The application as a set of .o
files that must not be changed.
2. The tool must provide more information than the capture engine currently
traces:
	a) More task events (task blocking/unblocking etc)
	b) Memory usage (libc dynamic memory usage)
	c) RTEMS workspace variables (start address, free memory, size)
	d) RTEMS API calls generation (the instant when the calls were
performed and their arguments and return)
	e) Interrupt generation
3. The tool must have a low temporal and memory overhead.
4. The tool must be temporarily deterministic.

>From 1 we take that the current initialization of the capture engine does
not meet the requirement. The capture engine needs to be initialized by the
application. 
The capture engine also uses mallocs (and reallocs) which are not
temporarily deterministic.
The capture engine data structure will need to be changed in order to cope
with the static memory allocation instead of the dynamic and to place a
single data structure that handles all the events generated
(task/interrupt/calls/etc). This data structure is more complex than the
capture engine data structure. 

The capture engine has things that we can recover, such as the stack usage
monitoring, the ring-buffer to keep track of the events (capture_record) and
the extension manager to trace some task events.

> >
> > [Manuel] From what I can tell this is a great idea. If I understood you
> > correctly, you are saying to make the "timeline tool" as something like
> an
> > RTEMS Manager that is included (or not) in the application Makefile.
> This
> > way the timeline tool would be always compiled and generate a dummy .rel
> > file and a "complete" .rel file.
> > Perhaps you could just give an initial pointer to get me started on this
> :).
> >
> 
> I do not see the need for a manager. To me a manager is something that is
> based on the RTEMS object and therefore has information tables. I do not
> like
> the dummy .rel way things are done in RTEMS (see PR1253 in bugzilla).
> 

The Manager approach has a good advantage: the user can enable/disable
(almost completely remove the tool from the memory footprint) the timeline
tool in the application Makefile. The low memory overhead is quite important
for space users since space limitation is an issue. With the Manager
approach the user doesn't have to build the RTEMS library twice (one with
the tool and another without the tool). Furthermore, this integration
follows the standard approach on the RTEMS Classic API.
If in the future the RTEMS Community decides to implement your approach with
the C++ constructors the "timeline tool manager" could be updated as long as
the other RTEMS managers. Of course, this would need more work :(. 

> I see this project as an expansion of the existing design so let me
> explain
> the current design first. There are 3 parts:
> 
>   1. Kernel trace points.
>   2. Filter/Trigger and Capture.
>   3. Transport/User interface.
> 
> The Kernel trace points are provided by the extension manager via the task
> switching API. It is currently a runtime configured means to capture the
> basic
> tasking related events. It suffers from always being enabled therefore
> incurring an overhead all the time even for users who do not use the
> functionality, and will not scale across the kernel for all the trace
> points
> we need to monitor.

The tool will monitor more events, so the tool needs to be extended. Even
though the tool is required not to change the application, for "non-space"
users we are planning to create an API that allows, like the capture engine,
to dynamically reconfigure the filter parameters. We are thinking of getting
some ideas to do this from the capture engine.

> 
> The Filter/Trigger and Capture is provided by the capture engine code.
> This
> code interfaces to the Kernel trace points and provides an API to manage
> and
> control the filtering, triggering and capture of data. It is complex
> because
> it is performance sensitive, has large memory demands and needs to handle
> the
> dynamics of system changes. The performance issue centers on the way
> filtering
> and triggering is handled. Filtering and triggering are very important as
> they
> keep the captured data down to a manageable level. This is the key design
> requirement. The filtering will allow the transport of the information a
> user
> needs to monitor to a remote host to occur with the smallest overhead and
> therefore smallest system impact. The triggering makes effective use of
> available capture memory and transport overhead by only starting to
> capture
> data when the required event happens. It is like using a logic analyzer,
> you
> do not want huge amounts of data and you want to start the capture as
> close to
> the event of interest as possible. With a logic analyzer it takes time to
> get
> the correct filtering and trigger to find the problem but it usually
> faster
> than looking at everything the system does in a hope of finding the
> problem.
> The case of checking a system for conformance (or what-ever) where
> everything
> needs to be monitored is just a special case of filter nothing and trigger
> on
> anything. The memory and system change issues are related to latency of
> the
> captured data from the system. For example a task is created, starts, runs
> and
> deletes. The capture engine needs to create an area to hold the required
> data
> until the captured data has been retrieved. This requires reference
> counting
> of the data against the captured events.
> 
> Both these parts of the code need to consider stack usage. The trace point
> code and the filtering/triggering code will run on task stacks and this
> code
> has no control over the size of that stack.
> 
> The Transport/User interface interfaces to the Filter/Trigger and Capture
> API.
> Currently there is a Command Line Interface (CLI) and I have started on a
> TCP
> server interface. A web server interface could be written or an
> application
> could contain its own code to perform these functions. This is the place
> where
> we have network interfaces and tasks that need application parameters to
> configure. In the case of the CLI it uses the RTEMS monitor and its task.
> The
> TCP interface has an accept thread and threads for the clients.
> 

In the transport/user interface we have some different objectives. The
information is always sent to a host platform, over serial cable or TCP/IP. 
This is a one way transmission to reduce the transmission overhead. Since we
do not allow a user interface, the monitor tool is not necessary. The
protocol (message format) is oriented to the data structure.

> Extending the Kernel trace points would be using the discussed method of a
> trace point calls that is removed by an enable configure option. The
> current
> extension manager would be extended to allow a call to register a table of
> pointers for trace points handlers. The trace point or capture point code
> could be:
> 
> #if RTEMS_KERNEL_CAPTURE
> #define rtems_capture_point_uint32( class, event, data ) \
> if (rtems_capture_trace_points && rtems_capture_trace_points[class]) {
>    if (rtems_capture_trace_points[class].enable & (1 << (event))) {
>      rtems_capture_trace_points[class].handler_uint32( event, data );
>    }
> }
> #else
> #define rtems_capture_point_uint32( class, event, data )
> #endif
> 
> #if RTEMS_CAPTURE_SEMAPHORE_CLASS
> #define rtems_capture_semaphore( event, id ) \
> rtems_capture_point_uint32( RTEMS_CAPTURE_SEMAPHORE_CLASS, event, id )
> #else
> #define rtems_capture_semaphore( event, id )
> #endif
> 

I see why you prefer this option instead of the #ifdef inside the RTEMS
core, but we prefer the manager solution. This solution requires two RTEMS
libraries to be built. The memory footprint overhead that the timeline tool
introduces if not select is negligible in the manager solution (empty
functions).
The table you used inside the "if" to filter the events is a good idea.

> There would be a range of these macros for different data types and
> classes.
> Also the second pointer test in the first if statement could be removed if
> we
> know the enable flags are set correctly. The second level macros
> (rtems_capture_semaphore) lets a class be enabled or disabled. This would
> allow a smaller table to be built for small foot print users.
> 
> The Filter/Trigger and Capture code needs to be updated to handle the new
> trace points and the different enable options and that is too much detail
> for
> here and now. The Transport and User Interfaces changes just follow on.
> 
> All this does not touch on interrupts. Interrupts are a completely
> separate
> topic again. This is a complex issue and will require special
> consideration.
> The code is specific to each processor and the ability to have nested
> interrupts will create interesting design issues. I would like topic to
> wait
> until we these issues resolved.
> 

The tool must be able to trace interrupt generation. A filter will be placed
to choose what sources to trace. We are thinking of using the
rtems_interrupt_catch routine to save the old ISR handler and replace it
with our own. Our ISR handler saves the interrupt event and calls the old
handler. We must take special care not to place our handler until the
application places its own (must be done after the drivers are initialized).

> >
> > [Manuel] The timeline tool will always impact the performance of the
> > application. The simple act of saving the event that occurred takes time
> and
> > the transmission to the host takes even more time. We plan to make the
> > transmission inside a task that has user configured priority and period
> (a
> > default value is provided).
> >
> 
> Yes I agree there will always be some impact. We need to consider a design
> that works for all users not just some with plenty of CPU performance,
> memory
> and network bandwidth. The important point is the need to filter and
> trigger.
> 
> For example you need to filter the effects of the transmission process
> from
> the event data being sent in most cases. Not doing this could place the
> system
> in an unstable state. That is transmission event data including the
> network
> stack, drivers etc causing more data than the system has the ability to
> send.
> 

Yes. If the system is not properly configured, the transmission itself can
generate more events that unstablelize the system.

> >
> > [Manuel] We can provide an interface that is device independent (i.e.
> writes
> > to serial, Ethernet, flash disk, etc). If I understood you in your
> > commentary above, the timeline tool is integrated as an RTEMS Manager so
> the
> > user won't have to do --enable-rctt (or whatever) when building RTEMS.
> The
> > user will only have to do something like
> >
> > managers = io semaphore message timeline
> >
> > right?
> 
> No. I see the capture engine like the RTEMS monitor, network stack or
> telnetd.
> The application or BSP makes the calls to set up and initialise.
> 

Why are you saying BSP? This would require changes in all BSPs no?

> The capture engine sample program shows what I mean:
> 
> http://www.rtems.org/cgi-
> bin/viewcvs.cgi/rtems/testsuites/samples/capture/init.c?rev=1.2
> 
> This code makes the following calls:
> 
>    rtems_monitor_init (0);
>    rtems_capture_cli_init (0);
> 
> The CLI needs the monitor. The capture engine CLI will call the capture
> engine
> and that will hook into the kernel. The only down side with this approach
> is
> some information may be missed but I can live with that.
> 

(besides having to change the application source code)

> Currently I will reject all manager solutions.
> 
> The Init task solution still exists but that would be specific to a users
> application and not RTEMS by default.
> 
> >
> > [Manuel] One of our requirements states that the timeline tool is
> > "independent" from the application, that is, the tool can collect
> > information even if the application does not call any timeline tool
> > function. To do this the tool must be up and running prior to the
> > application is started.
> >
> 
> This all depends on where you draw the application starting line. For me
> it is
> main. In your case I would argue the BSP making the calls is fine. If that
> was
> rejected I would create a C++ static object and make the calls in that.
> Finally you could always set up your own set of Init tasks. What I see as
> important is the simple call to start the capture engine and no extra
> hidden
> magic in the system. We should let the user, who knows their own system
> best,
> make the choice of where to get things started and when. An RTEMS rebuild
> is
> not an option to control this.
> 
> For me I like main as it is standardized and has command line arguments
> and
> therefore a simple run time control things like this
> 

The application is everything outside RTEMS. The space users (and possibly
other) can have a certified application, which cannot be changed, which they
want to further analyze with the timeline tool.
If we change the confdefs file to create a new starting task the problem
remains because the application needs to be recompiled. 

> >
> > [Manuel] We were planning to create a NEW structure that contains
> pointers
> > to functions that are called at critical instants (like the RTEMS
> Extension
> > Manager). These instants correspond to task insertion in a ready queue,
> task
> > priority changes and task suspending/resuming. An
> rtems_extension_createX
> > function would just set the pointers correctly. We are now thinking of
> > building a timeline tool Manager that incorporates this functionality.
> >
> 
> I agree about the table as show in the code above.
> 
> I see no need for a manager. I would also have the table passed to the
> extensions manager. This way the kernel does not contain yet another table
> that is not used. The capture engine can malloc memory for the table and
> fill
> it in at runtime. To disable the capturing just pass a 0 as the table
> pointer.
> This makes the API change to the extension manager a single call.
> 
> >
> > [Manuel] We are thinking of creation a file inside the RTEMS source code
> > where the user can modify which calls are by default traced (not logged
> :)).
> > The application can call a timeline tool API to, at runtime,
> enable/disable
> > specific traces (e.g. want to monitor semaphore calls but not region nor
> > partition). This file is a .c (or .h) file and by modifying these
> parameters
> > the user would have to recompile RTEMS (at least this .c file).
> >
> 
> No file please. See the enable flags for the events in the above code. The
> capture engine can access these flags directly (with interrupts masked).
> 
> We are trying to move to binary cpukit rpms and this does not support that
> path. The cpukit rpm will be released items and therefore provide users
> with a
> tested RTEMS. As it stands the enable configure option for this code
> complicates these rpms.
>

The tool will provide a dynamic configuration API so that the application
can change the parameters easily like the capture engine does. However we
will place a .c file to allow the configuration of the default parameters.
This can only be changed if the user has access to the RTEMS source code
instead of the rpms.

> >
> > [Manuel] Since (I think) we are moving in the direction of building a
> > "Timeline Tool Manager",
> 
> Please do not do this. The network stack is not an RTEMS manager and it is
> a
> much bigger piece of code than this.

But the network stack is typically an implementation issue, i.e., the
application either needs networking or not. The timeline tool is something
that all applications can use and is enabled/disabled like the semaphore
manager, e.g., if the application decides it needs a semaphore to
synchronize two tasks it just adds the semaphore manager in its Makefile. A
RTEMS recompilation is not necessary.

> 
> > I think we don't need to create an extension to the
> > RTEMS Extension Manager: the new functionality is implemented by the
> > timeline tool Manager. This way, if the timeline tool manager is not
> > selected, an empty function is called and no performance is lost (no
> "if" is
> > performed).
> 
> No compile time changes so dead code removal by the compiler will not
> work.
> 
> > I also agree that the body of the "timeline tool" should be placed in a
> > different areas other that the RTEMS source code (e.g. libmisc/capture)
> but
> > some functions must be called from inside the remaining RTEMS source
> code.
> 
> Only the inline calls coded something like above.

I see what you mean. (I was thinking the same thing).

> 
> > We are not thinking of allowing the application to trace specific
> events.
> > The timeline tool main purpose is to analyze what the application is
> doing
> > in terms of schedulability.
> 
> For you this may be the case but I see other uses. For example I may like
> to
> add to my code:
> 
> rtems_capture_point_uint32( RTEMS_CAPTURE_APP_CLASS,
>                              MY_BIG_RED_BUTTON, 0xdeaddead);
> 
> and that becomes my trigger point.
> 
> > The application can define certain parameters so as to not trace every
> > single event. I did not mention it before because it takes a long time
> to
> > explain every parameter.
> 
> I see this as a user interface issue. It also does not match the
> requirement
> you stated of the application not know if the tool is enable or not.

The tool has a default set o filter parameters. This initializes the dynamic
state of the filter which can be modified at runtime.

> 
> > I think your idea of a table that contains flags with the information
> about
> > whether to save (or not) information that a given event has occurred is
> a
> > good one.
> 
> Great.
> 
> Regards
> Chris