Self-contained one purpose objects
Sebastian Huber
sebastian.huber at embedded-brains.de
Thu Jul 23 09:16:03 UTC 2015
The Classic RTEMS and POSIX APIs have at least three weaknesses.
* Dynamic memory (the workspace) is used to allocate object pools. This
requires a complex configuration with heavy use of the C pre-processor.
* Objects are created via function calls which return an object identifier.
The object operations use this identifier and internally map it to an
internal object representation.
* The object operations use a rich set of options and attributes. Each time
these parameters must be evaluated and validated to figure out what
to do.
For applications that use fine grained locking the overhead to map the
identifier to the object representation and the parameter evaluation is a
significant overhead the may degrade the performance dramatically. An
example
is the FreeBSD network stack which hundreds of locks in a basic setup. Here
the performance can be easily measured in terms of throughput and processor
utilization. The port of the FreeBSD network stack uses now its own
priority
inheritance mutex implementation which is not based on the classic RTEMS
objects. The blocking part however uses the standard thread queues. The
overall implementation is quite simple.
Another example which benefits from self-contained objects is OpenMP. For
OpenMP the performance of the POSIX configuration of libgomp and an
optimized
implementation using self-contained objects available via Newlib
<sys/lock.h>
is significantly different, see https://devel.rtems.org/ticket/2274.
Some test
cases are more than a hundred times slower in the POSIX configuration of
libgomp.
Since the Newlib should use locks to protect some global data structures
(https://devel.rtems.org/ticket/1247) and the GCC uses locks for the C++ and
OpenMP support the application must take this into account. It is
difficult to
figure out how many and which objects will be used by Newlib and GCC for a
particular application. It would be much easier with self-contained objects
where the object user has the responsibility to provide the storage space.
This could be a statically initialized global object or an embedded
object in a
structure.
A list of requirements for self-contained lock objects follows.
* The initial value of the lock object structure components shall be zero.
This makes it possible to use memset(lock, 0, sizeof(*lock)) for
initialization. Statically initialized lock objects can reside in
the .bss
section.
* The lock object structure definition shall be independent of RTEMS header
files and the RTEMS configuration. So only standard types and
pointers to
types with a forward declaration can be used. With the recent change
of the
thread queue implementation this is possible to fulfill.
* The lock shall avoid priority inversion problems.
Self-contained objects exist as a prototype implementation and show
excellent
results in terms of performance. The data structures defined in Newlib
must be
independent of RTEMS build configurations, like SMP enabled/disabled,
profiling
enabled/disabled, debug enabled/disabled, etc. The basic structure is like
this:
struct _Thread_Control;
struct _Thread_queue_Heads;
struct _Ticket_lock_Control {
unsigned int _next_ticket;
unsigned int _now_serving;
};
struct _Thread_queue_Queue {
struct _Thread_queue_Heads *_heads;
struct _Ticket_lock_Control _Lock;
};
struct _Mutex_Control {
struct _Thread_queue_Queue _Queue;
struct _Thread_Control *_owner;
};
So, a mutex object consists only of 16 bytes on a 32-bit architecture. It
supports uni-processor and SMP configurations (the SMP support needs 8 bytes
for the ticket lock). Two implementation details are exposed to Newlib.
1. The SMP lock data structure. One possible alternative to a ticket lock
are MCS locks. They use only one pointer instead of two integers.
So this
could be addressed with a union in case we really use MCS locks in the
future.
2. The thread queue structure. This is only a pointer to the thread queue
heads and the lock. This should be acceptable since it is unlikely that
the thread queue structure will change shortly, since this structure is
already highly optimized.
I see four uses cases for self-contained objects.
1. The new network stack (uses already its own implementation of
self-contained
objects).
2. The OpenMP support of GCC (libgomp). Here it is a must-have due to
performance reasons.
3. The Newlib internal locks. The advantage compared to using Classic API
objects is better performance, no configuration issues, and smaller
memory
footprint.
4. The GCC thread model. Same advantages as before. In addition this would
enable an easy to use and efficient C11 and C++11 thread support.
In the long run this could lead to a very small footprint system without
dependencies on dynamic memory and a purely static initialization.
What are your opinions?
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lock.h
Type: text/x-chdr
Size: 5164 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/devel/attachments/20150723/a8f57184/attachment.bin>
More information about the devel
mailing list