Timeline Tool
Chris Johns
chrisj at rtems.org
Sat Oct 20 00:17:59 UTC 2007
Manuel Coutinho wrote:
>
> [Manuel] I think the inclusion of the "timeline tool" inside the capture
> engine is a good idea. But we have some requirements that might require some
> modifications to the current capture engine. I would prefer building a
> separate tool because we don't want to change the current status of the
> capture engine, i.e., its current functionality. If you (and the RTEMS
> community) don't have any issues with us modifying the current capture
> engine we also don't have any problems.
>
The need to keep a single functional part in RTEMS is important.
It is hard to judge what the issues are without detail about the changes. Any
changes to the capture engine will be handled like any other changes in RTEMS.
We will review the patch, comment, and either accept or reject the change.
Please note the deeper into RTEMS's score etc you go the harder the code is
looked at. Following the coding standard of that code becomes important.
>
> [Manuel] From what I can tell this is a great idea. If I understood you
> correctly, you are saying to make the "timeline tool" as something like an
> RTEMS Manager that is included (or not) in the application Makefile. This
> way the timeline tool would be always compiled and generate a dummy .rel
> file and a "complete" .rel file.
> Perhaps you could just give an initial pointer to get me started on this :).
>
I do not see the need for a manager. To me a manager is something that is
based on the RTEMS object and therefore has information tables. I do not like
the dummy .rel way things are done in RTEMS (see PR1253 in bugzilla).
I see this project as an expansion of the existing design so let me explain
the current design first. There are 3 parts:
1. Kernel trace points.
2. Filter/Trigger and Capture.
3. Transport/User interface.
The Kernel trace points are provided by the extension manager via the task
switching API. It is currently a runtime configured means to capture the basic
tasking related events. It suffers from always being enabled therefore
incurring an overhead all the time even for users who do not use the
functionality, and will not scale across the kernel for all the trace points
we need to monitor.
The Filter/Trigger and Capture is provided by the capture engine code. This
code interfaces to the Kernel trace points and provides an API to manage and
control the filtering, triggering and capture of data. It is complex because
it is performance sensitive, has large memory demands and needs to handle the
dynamics of system changes. The performance issue centers on the way filtering
and triggering is handled. Filtering and triggering are very important as they
keep the captured data down to a manageable level. This is the key design
requirement. The filtering will allow the transport of the information a user
needs to monitor to a remote host to occur with the smallest overhead and
therefore smallest system impact. The triggering makes effective use of
available capture memory and transport overhead by only starting to capture
data when the required event happens. It is like using a logic analyzer, you
do not want huge amounts of data and you want to start the capture as close to
the event of interest as possible. With a logic analyzer it takes time to get
the correct filtering and trigger to find the problem but it usually faster
than looking at everything the system does in a hope of finding the problem.
The case of checking a system for conformance (or what-ever) where everything
needs to be monitored is just a special case of filter nothing and trigger on
anything. The memory and system change issues are related to latency of the
captured data from the system. For example a task is created, starts, runs and
deletes. The capture engine needs to create an area to hold the required data
until the captured data has been retrieved. This requires reference counting
of the data against the captured events.
Both these parts of the code need to consider stack usage. The trace point
code and the filtering/triggering code will run on task stacks and this code
has no control over the size of that stack.
The Transport/User interface interfaces to the Filter/Trigger and Capture API.
Currently there is a Command Line Interface (CLI) and I have started on a TCP
server interface. A web server interface could be written or an application
could contain its own code to perform these functions. This is the place where
we have network interfaces and tasks that need application parameters to
configure. In the case of the CLI it uses the RTEMS monitor and its task. The
TCP interface has an accept thread and threads for the clients.
Extending the Kernel trace points would be using the discussed method of a
trace point calls that is removed by an enable configure option. The current
extension manager would be extended to allow a call to register a table of
pointers for trace points handlers. The trace point or capture point code
could be:
#if RTEMS_KERNEL_CAPTURE
#define rtems_capture_point_uint32( class, event, data ) \
if (rtems_capture_trace_points && rtems_capture_trace_points[class]) {
if (rtems_capture_trace_points[class].enable & (1 << (event))) {
rtems_capture_trace_points[class].handler_uint32( event, data );
}
}
#else
#define rtems_capture_point_uint32( class, event, data )
#endif
#if RTEMS_CAPTURE_SEMAPHORE_CLASS
#define rtems_capture_semaphore( event, id ) \
rtems_capture_point_uint32( RTEMS_CAPTURE_SEMAPHORE_CLASS, event, id )
#else
#define rtems_capture_semaphore( event, id )
#endif
There would be a range of these macros for different data types and classes.
Also the second pointer test in the first if statement could be removed if we
know the enable flags are set correctly. The second level macros
(rtems_capture_semaphore) lets a class be enabled or disabled. This would
allow a smaller table to be built for small foot print users.
The Filter/Trigger and Capture code needs to be updated to handle the new
trace points and the different enable options and that is too much detail for
here and now. The Transport and User Interfaces changes just follow on.
All this does not touch on interrupts. Interrupts are a completely separate
topic again. This is a complex issue and will require special consideration.
The code is specific to each processor and the ability to have nested
interrupts will create interesting design issues. I would like topic to wait
until we these issues resolved.
>
> [Manuel] The timeline tool will always impact the performance of the
> application. The simple act of saving the event that occurred takes time and
> the transmission to the host takes even more time. We plan to make the
> transmission inside a task that has user configured priority and period (a
> default value is provided).
>
Yes I agree there will always be some impact. We need to consider a design
that works for all users not just some with plenty of CPU performance, memory
and network bandwidth. The important point is the need to filter and trigger.
For example you need to filter the effects of the transmission process from
the event data being sent in most cases. Not doing this could place the system
in an unstable state. That is transmission event data including the network
stack, drivers etc causing more data than the system has the ability to send.
>
> [Manuel] We can provide an interface that is device independent (i.e. writes
> to serial, Ethernet, flash disk, etc). If I understood you in your
> commentary above, the timeline tool is integrated as an RTEMS Manager so the
> user won't have to do --enable-rctt (or whatever) when building RTEMS. The
> user will only have to do something like
>
> managers = io semaphore message timeline
>
> right?
No. I see the capture engine like the RTEMS monitor, network stack or telnetd.
The application or BSP makes the calls to set up and initialise.
The capture engine sample program shows what I mean:
http://www.rtems.org/cgi-bin/viewcvs.cgi/rtems/testsuites/samples/capture/init.c?rev=1.2
This code makes the following calls:
rtems_monitor_init (0);
rtems_capture_cli_init (0);
The CLI needs the monitor. The capture engine CLI will call the capture engine
and that will hook into the kernel. The only down side with this approach is
some information may be missed but I can live with that.
Currently I will reject all manager solutions.
The Init task solution still exists but that would be specific to a users
application and not RTEMS by default.
>
> [Manuel] One of our requirements states that the timeline tool is
> "independent" from the application, that is, the tool can collect
> information even if the application does not call any timeline tool
> function. To do this the tool must be up and running prior to the
> application is started.
>
This all depends on where you draw the application starting line. For me it is
main. In your case I would argue the BSP making the calls is fine. If that was
rejected I would create a C++ static object and make the calls in that.
Finally you could always set up your own set of Init tasks. What I see as
important is the simple call to start the capture engine and no extra hidden
magic in the system. We should let the user, who knows their own system best,
make the choice of where to get things started and when. An RTEMS rebuild is
not an option to control this.
For me I like main as it is standardized and has command line arguments and
therefore a simple run time control things like this
>
> [Manuel] We were planning to create a NEW structure that contains pointers
> to functions that are called at critical instants (like the RTEMS Extension
> Manager). These instants correspond to task insertion in a ready queue, task
> priority changes and task suspending/resuming. An rtems_extension_createX
> function would just set the pointers correctly. We are now thinking of
> building a timeline tool Manager that incorporates this functionality.
>
I agree about the table as show in the code above.
I see no need for a manager. I would also have the table passed to the
extensions manager. This way the kernel does not contain yet another table
that is not used. The capture engine can malloc memory for the table and fill
it in at runtime. To disable the capturing just pass a 0 as the table pointer.
This makes the API change to the extension manager a single call.
>
> [Manuel] We are thinking of creation a file inside the RTEMS source code
> where the user can modify which calls are by default traced (not logged :)).
> The application can call a timeline tool API to, at runtime, enable/disable
> specific traces (e.g. want to monitor semaphore calls but not region nor
> partition). This file is a .c (or .h) file and by modifying these parameters
> the user would have to recompile RTEMS (at least this .c file).
>
No file please. See the enable flags for the events in the above code. The
capture engine can access these flags directly (with interrupts masked).
We are trying to move to binary cpukit rpms and this does not support that
path. The cpukit rpm will be released items and therefore provide users with a
tested RTEMS. As it stands the enable configure option for this code
complicates these rpms.
>
> [Manuel] Since (I think) we are moving in the direction of building a
> "Timeline Tool Manager",
Please do not do this. The network stack is not an RTEMS manager and it is a
much bigger piece of code than this.
> I think we don't need to create an extension to the
> RTEMS Extension Manager: the new functionality is implemented by the
> timeline tool Manager. This way, if the timeline tool manager is not
> selected, an empty function is called and no performance is lost (no "if" is
> performed).
No compile time changes so dead code removal by the compiler will not work.
> I also agree that the body of the "timeline tool" should be placed in a
> different areas other that the RTEMS source code (e.g. libmisc/capture) but
> some functions must be called from inside the remaining RTEMS source code.
Only the inline calls coded something like above.
> We are not thinking of allowing the application to trace specific events.
> The timeline tool main purpose is to analyze what the application is doing
> in terms of schedulability.
For you this may be the case but I see other uses. For example I may like to
add to my code:
rtems_capture_point_uint32( RTEMS_CAPTURE_APP_CLASS,
MY_BIG_RED_BUTTON, 0xdeaddead);
and that becomes my trigger point.
> The application can define certain parameters so as to not trace every
> single event. I did not mention it before because it takes a long time to
> explain every parameter.
I see this as a user interface issue. It also does not match the requirement
you stated of the application not know if the tool is enable or not.
> I think your idea of a table that contains flags with the information about
> whether to save (or not) information that a given event has occurred is a
> good one.
Great.
Regards
Chris
More information about the users
mailing list