Self-contained one purpose objects

Thu Jul 23 09:16:03 UTC 2015

The Classic RTEMS and POSIX APIs have at least three weaknesses.

* Dynamic memory (the workspace) is used to allocate object pools. This
   requires a complex configuration with heavy use of the C pre-processor.

* Objects are created via function calls which return an object identifier.
   The object operations use this identifier and internally map it to an
   internal object representation.

* The object operations use a rich set of options and attributes. Each time
   these parameters must be evaluated and validated to figure out what 
to do.

For applications that use fine grained locking the overhead to map the
identifier to the object representation and the parameter evaluation is a
significant overhead the may degrade the performance dramatically. An 
example
is the FreeBSD network stack which hundreds of locks in a basic setup.  Here
the performance can be easily measured in terms of throughput and processor
utilization.  The port of the FreeBSD network stack uses now its own 
priority
inheritance mutex implementation which is not based on the classic RTEMS
objects.  The blocking part however uses the standard thread queues.  The
overall implementation is quite simple.

Another example which benefits from self-contained objects is OpenMP.  For
OpenMP the performance of the POSIX configuration of libgomp and an 
optimized
implementation using self-contained objects available via Newlib 
<sys/lock.h>
is significantly different, see https://devel.rtems.org/ticket/2274.  
Some test
cases are more than a hundred times slower in the POSIX configuration of
libgomp.

Since the Newlib should use locks to protect some global data structures
(https://devel.rtems.org/ticket/1247) and the GCC uses locks for the C++ and
OpenMP support the application must take this into account.  It is 
difficult to
figure out how many and which objects will be used by Newlib and GCC for a
particular application.  It would be much easier with self-contained objects
where the object user has the responsibility to provide the storage space.
This could be a statically initialized global object or an embedded 
object in a
structure.

A list of requirements for self-contained lock objects follows.

* The initial value of the lock object structure components shall be zero.
   This makes it possible to use memset(lock, 0, sizeof(*lock)) for
   initialization.  Statically initialized lock objects can reside in 
the .bss
   section.

* The lock object structure definition shall be independent of RTEMS header
   files and the RTEMS configuration.  So only standard types and 
pointers to
   types with a forward declaration can be used.  With the recent change 
of the
   thread queue implementation this is possible to fulfill.

* The lock shall avoid priority inversion problems.

Self-contained objects exist as a prototype implementation and show 
excellent
results in terms of performance.  The data structures defined in Newlib 
must be
independent of RTEMS build configurations, like SMP enabled/disabled, 
profiling
enabled/disabled, debug enabled/disabled, etc.  The basic structure is like
this:

struct _Thread_Control;

struct _Thread_queue_Heads;

struct _Ticket_lock_Control {
     unsigned int _next_ticket;
     unsigned int _now_serving;
};

struct _Thread_queue_Queue {
     struct _Thread_queue_Heads *_heads;
     struct _Ticket_lock_Control _Lock;
};

struct _Mutex_Control {
     struct _Thread_queue_Queue _Queue;
     struct _Thread_Control *_owner;
};

So, a mutex object consists only of 16 bytes on a 32-bit architecture.  It
supports uni-processor and SMP configurations (the SMP support needs 8 bytes
for the ticket lock).  Two implementation details are exposed to Newlib.

1. The SMP lock data structure.  One possible alternative to a ticket lock
    are MCS locks.  They use only one pointer instead of two integers.  
So this
    could be addressed with a union in case we really use MCS locks in the
    future.

2. The thread queue structure.  This is only a pointer to the thread queue
    heads and the lock.  This should be acceptable since it is unlikely that
    the thread queue structure will change shortly, since this structure is
    already highly optimized.

I see four uses cases for self-contained objects.

1. The new network stack (uses already its own implementation of 
self-contained
    objects).

2. The OpenMP support of GCC (libgomp).  Here it is a must-have due to
    performance reasons.

3. The Newlib internal locks.  The advantage compared to using Classic API
    objects is better performance, no configuration issues, and smaller 
memory
    footprint.

4. The GCC thread model.  Same advantages as before.  In addition this would
    enable an easy to use and efficient C11 and C++11 thread support.

In the long run this could lead to a very small footprint system without
dependencies on dynamic memory and a purely static initialization.

What are your opinions?

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lock.h
Type: text/x-chdr
Size: 5164 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/devel/attachments/20150723/a8f57184/attachment.bin>