Timeline Tool

Sat Oct 20 00:17:59 UTC 2007

Manuel Coutinho wrote:
> 
> [Manuel] I think the inclusion of the "timeline tool" inside the capture
> engine is a good idea. But we have some requirements that might require some
> modifications to the current capture engine. I would prefer building a
> separate tool because we don't want to change the current status of the
> capture engine, i.e., its current functionality. If you (and the RTEMS
> community) don't have any issues with us modifying the current capture
> engine we also don't have any problems.
> 

The need to keep a single functional part in RTEMS is important.

It is hard to judge what the issues are without detail about the changes. Any 
changes to the capture engine will be handled like any other changes in RTEMS. 
We will review the patch, comment, and either accept or reject the change.

Please note the deeper into RTEMS's score etc you go the harder the code is 
looked at. Following the coding standard of that code becomes important.

> 
> [Manuel] From what I can tell this is a great idea. If I understood you
> correctly, you are saying to make the "timeline tool" as something like an
> RTEMS Manager that is included (or not) in the application Makefile. This
> way the timeline tool would be always compiled and generate a dummy .rel
> file and a "complete" .rel file.
> Perhaps you could just give an initial pointer to get me started on this :).
> 

I do not see the need for a manager. To me a manager is something that is 
based on the RTEMS object and therefore has information tables. I do not like 
the dummy .rel way things are done in RTEMS (see PR1253 in bugzilla).

I see this project as an expansion of the existing design so let me explain 
the current design first. There are 3 parts:

  1. Kernel trace points.
  2. Filter/Trigger and Capture.
  3. Transport/User interface.

The Kernel trace points are provided by the extension manager via the task 
switching API. It is currently a runtime configured means to capture the basic 
tasking related events. It suffers from always being enabled therefore 
incurring an overhead all the time even for users who do not use the 
functionality, and will not scale across the kernel for all the trace points 
we need to monitor.

The Filter/Trigger and Capture is provided by the capture engine code. This 
code interfaces to the Kernel trace points and provides an API to manage and 
control the filtering, triggering and capture of data. It is complex because 
it is performance sensitive, has large memory demands and needs to handle the 
dynamics of system changes. The performance issue centers on the way filtering 
and triggering is handled. Filtering and triggering are very important as they 
keep the captured data down to a manageable level. This is the key design 
requirement. The filtering will allow the transport of the information a user 
needs to monitor to a remote host to occur with the smallest overhead and 
therefore smallest system impact. The triggering makes effective use of 
available capture memory and transport overhead by only starting to capture 
data when the required event happens. It is like using a logic analyzer, you 
do not want huge amounts of data and you want to start the capture as close to 
the event of interest as possible. With a logic analyzer it takes time to get 
the correct filtering and trigger to find the problem but it usually faster 
than looking at everything the system does in a hope of finding the problem. 
The case of checking a system for conformance (or what-ever) where everything 
needs to be monitored is just a special case of filter nothing and trigger on 
anything. The memory and system change issues are related to latency of the 
captured data from the system. For example a task is created, starts, runs and 
deletes. The capture engine needs to create an area to hold the required data 
until the captured data has been retrieved. This requires reference counting 
of the data against the captured events.

Both these parts of the code need to consider stack usage. The trace point 
code and the filtering/triggering code will run on task stacks and this code 
has no control over the size of that stack.

The Transport/User interface interfaces to the Filter/Trigger and Capture API. 
Currently there is a Command Line Interface (CLI) and I have started on a TCP 
server interface. A web server interface could be written or an application 
could contain its own code to perform these functions. This is the place where 
we have network interfaces and tasks that need application parameters to 
configure. In the case of the CLI it uses the RTEMS monitor and its task. The 
TCP interface has an accept thread and threads for the clients.

Extending the Kernel trace points would be using the discussed method of a 
trace point calls that is removed by an enable configure option. The current 
extension manager would be extended to allow a call to register a table of 
pointers for trace points handlers. The trace point or capture point code 
could be:

#if RTEMS_KERNEL_CAPTURE
#define rtems_capture_point_uint32( class, event, data ) \
if (rtems_capture_trace_points && rtems_capture_trace_points[class]) {
   if (rtems_capture_trace_points[class].enable & (1 << (event))) {
     rtems_capture_trace_points[class].handler_uint32( event, data );
   }
}
#else
#define rtems_capture_point_uint32( class, event, data )
#endif

#if RTEMS_CAPTURE_SEMAPHORE_CLASS
#define rtems_capture_semaphore( event, id ) \
rtems_capture_point_uint32( RTEMS_CAPTURE_SEMAPHORE_CLASS, event, id )
#else
#define rtems_capture_semaphore( event, id )
#endif

There would be a range of these macros for different data types and classes. 
Also the second pointer test in the first if statement could be removed if we 
know the enable flags are set correctly. The second level macros 
(rtems_capture_semaphore) lets a class be enabled or disabled. This would 
allow a smaller table to be built for small foot print users.

The Filter/Trigger and Capture code needs to be updated to handle the new 
trace points and the different enable options and that is too much detail for 
here and now. The Transport and User Interfaces changes just follow on.

All this does not touch on interrupts. Interrupts are a completely separate 
topic again. This is a complex issue and will require special consideration. 
The code is specific to each processor and the ability to have nested 
interrupts will create interesting design issues. I would like topic to wait 
until we these issues resolved.

> 
> [Manuel] The timeline tool will always impact the performance of the
> application. The simple act of saving the event that occurred takes time and
> the transmission to the host takes even more time. We plan to make the
> transmission inside a task that has user configured priority and period (a
> default value is provided).
> 

Yes I agree there will always be some impact. We need to consider a design 
that works for all users not just some with plenty of CPU performance, memory 
and network bandwidth. The important point is the need to filter and trigger.

For example you need to filter the effects of the transmission process from 
the event data being sent in most cases. Not doing this could place the system 
in an unstable state. That is transmission event data including the network 
stack, drivers etc causing more data than the system has the ability to send.

> 
> [Manuel] We can provide an interface that is device independent (i.e. writes
> to serial, Ethernet, flash disk, etc). If I understood you in your
> commentary above, the timeline tool is integrated as an RTEMS Manager so the
> user won't have to do --enable-rctt (or whatever) when building RTEMS. The
> user will only have to do something like
> 
> managers = io semaphore message timeline
> 
> right?

No. I see the capture engine like the RTEMS monitor, network stack or telnetd. 
The application or BSP makes the calls to set up and initialise.

The capture engine sample program shows what I mean:

http://www.rtems.org/cgi-bin/viewcvs.cgi/rtems/testsuites/samples/capture/init.c?rev=1.2

This code makes the following calls:

   rtems_monitor_init (0);
   rtems_capture_cli_init (0);

The CLI needs the monitor. The capture engine CLI will call the capture engine 
and that will hook into the kernel. The only down side with this approach is 
some information may be missed but I can live with that.

Currently I will reject all manager solutions.

The Init task solution still exists but that would be specific to a users 
application and not RTEMS by default.

> 
> [Manuel] One of our requirements states that the timeline tool is
> "independent" from the application, that is, the tool can collect
> information even if the application does not call any timeline tool
> function. To do this the tool must be up and running prior to the
> application is started.
> 

This all depends on where you draw the application starting line. For me it is 
main. In your case I would argue the BSP making the calls is fine. If that was 
rejected I would create a C++ static object and make the calls in that. 
Finally you could always set up your own set of Init tasks. What I see as 
important is the simple call to start the capture engine and no extra hidden 
magic in the system. We should let the user, who knows their own system best, 
make the choice of where to get things started and when. An RTEMS rebuild is 
not an option to control this.

For me I like main as it is standardized and has command line arguments and 
therefore a simple run time control things like this

> 
> [Manuel] We were planning to create a NEW structure that contains pointers
> to functions that are called at critical instants (like the RTEMS Extension
> Manager). These instants correspond to task insertion in a ready queue, task
> priority changes and task suspending/resuming. An rtems_extension_createX
> function would just set the pointers correctly. We are now thinking of
> building a timeline tool Manager that incorporates this functionality.
> 

I agree about the table as show in the code above.

I see no need for a manager. I would also have the table passed to the 
extensions manager. This way the kernel does not contain yet another table 
that is not used. The capture engine can malloc memory for the table and fill 
it in at runtime. To disable the capturing just pass a 0 as the table pointer. 
This makes the API change to the extension manager a single call.

> 
> [Manuel] We are thinking of creation a file inside the RTEMS source code
> where the user can modify which calls are by default traced (not logged :)).
> The application can call a timeline tool API to, at runtime, enable/disable
> specific traces (e.g. want to monitor semaphore calls but not region nor
> partition). This file is a .c (or .h) file and by modifying these parameters
> the user would have to recompile RTEMS (at least this .c file).
> 

No file please. See the enable flags for the events in the above code. The 
capture engine can access these flags directly (with interrupts masked).

We are trying to move to binary cpukit rpms and this does not support that 
path. The cpukit rpm will be released items and therefore provide users with a 
tested RTEMS. As it stands the enable configure option for this code 
complicates these rpms.

> 
> [Manuel] Since (I think) we are moving in the direction of building a
> "Timeline Tool Manager",

Please do not do this. The network stack is not an RTEMS manager and it is a 
much bigger piece of code than this.

> I think we don't need to create an extension to the
> RTEMS Extension Manager: the new functionality is implemented by the
> timeline tool Manager. This way, if the timeline tool manager is not
> selected, an empty function is called and no performance is lost (no "if" is
> performed). 

No compile time changes so dead code removal by the compiler will not work.

> I also agree that the body of the "timeline tool" should be placed in a
> different areas other that the RTEMS source code (e.g. libmisc/capture) but
> some functions must be called from inside the remaining RTEMS source code. 

Only the inline calls coded something like above.

> We are not thinking of allowing the application to trace specific events.
> The timeline tool main purpose is to analyze what the application is doing
> in terms of schedulability.

For you this may be the case but I see other uses. For example I may like to 
add to my code:

rtems_capture_point_uint32( RTEMS_CAPTURE_APP_CLASS,
                             MY_BIG_RED_BUTTON, 0xdeaddead);

and that becomes my trigger point.

> The application can define certain parameters so as to not trace every
> single event. I did not mention it before because it takes a long time to
> explain every parameter.

I see this as a user interface issue. It also does not match the requirement 
you stated of the application not know if the tool is enable or not.

> I think your idea of a table that contains flags with the information about
> whether to save (or not) information that a given event has occurred is a
> good one.

Great.

Regards
Chris