Self-contained one purpose objects

Thu Jul 23 11:31:23 UTC 2015

Hello Pavel,

thanks for your comments.

On 23/07/15 12:40, Pavel Pisa wrote:
> Hello Sebastian,
>
> the first big thanks for RTEMS architectural updates.
>
> On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:
>> The Classic RTEMS and POSIX APIs have at least three weaknesses.
>>
>> * Dynamic memory (the workspace) is used to allocate object pools. This
>>     requires a complex configuration with heavy use of the C pre-processor.
>>
>> * Objects are created via function calls which return an object identifier.
>>     The object operations use this identifier and internally map it to an
>>     internal object representation.
>>
>> * The object operations use a rich set of options and attributes. Each time
>>     these parameters must be evaluated and validated to figure out what
>> to do.
> ...
>> In the long run this could lead to a very small footprint system without
>> dependencies on dynamic memory and a purely static initialization.
>>
>> What are your opinions?
> I fully understand your motivation and for small footprint system
> the direct pointers use is most efficient option.
> But in the area of smallest footprint systems there are many
> alternatives to RTEMS - MBED, Nuttx etc.

my goal is to get it smaller compared to what we have now. I don't want 
the smallest system on the market. This would be only a side-effect, the 
main purpose of the self-contained objects is performance and an easier 
configuration. I think the conditional compilation in <rtems/confdefs.h> 
has reached a problematic complexity.

>
> RTEMS layering (Score, APIs, object identifiers etc) is quite
> complex and has considerable overhead. On the other hand
> these layers have added value that RTEMS has options to be
> used in more complex scenarios. These fact to consider
>
>   * if all locking construction use identifiers then they
>     are well traceable. There is some problem with pthreads
>     there that pthread_create does not have parameter
>     for thread identifier/purpose.
>
>   * use of identifiers and calling system operations with these
>     identifiers allows to keep applications API even for case
>     when operating system and applications runs in the separate
>     domains/CPU privilege/ring. Pasing of pointers is really problematic
>     in such use cases. RTEMS does not use memory spaces separation
>     and it is questionable if MMU contex switches overhead is appropriate
>     for some system use. But on the other hand there can be interesting
>     uses where RTEMS is ported to microkernel and multiple RTEMS
>     instances are run in address space separated domains even with
>     strict temporal separation (POK, SpikeOS). These options
>     has not been used in RTEMS too much yet. But there is one specific
>     and unique area for RTEMS and it is support for asymmetric,
>     heterogeneous multiprocessing. I am not sure how much is this
>     feature in actual use today. But RTEMS is unique in this respect
>     and use of task optimised CPU cores with different architectures
>     would play important role in the future computing - see all todays
>     GPUs, APUs and FPGA projects.
>
> So my suggestion is to take all these use cases into consideration.
> It should not be taken as the hard requirement, if really means
> unacceptable overhead for common use cases but should be considered.

I don't want to change the existing APIs. This object identifier 
infrastructure is fine, but it was designed for a specific purpose, e.g. 
to enable a platform that supports asymmetric multiprocessing (the RTEMS 
MPCI support). With SMP we see now its limitations. A complex SMP 
application like the FreeBSD network stack uses hundreds of locks and 
the protection area of the locks are quite small. The lock/unlock 
sequence of an uncontested mutex is absolutely performance critical.

[...]
>
> As for implementation, I expect that maximal optimisation is required
> for lock path without contention. This path should be optimised to
> be inline or simple function call inside application context.
> It is question if there should be considered even mutexes which
> can be used in heterogeneous setups. If such type is found then there
> would be need to call "system" or other more complex function.
> But I think that these uses can be left for classic semaphores.
> Classic mutexes are usually considered as mechanism used inside
> threaded application/subsystem and not to spread from single
> address sapace.

Yes.

>
> My feeling is that locking case with contention/wait should be
> implemented the way that it allows future privilege separation
> of scheduler/system core from applications as well as memory
> context separation or use of hypervisors calls for wait.
>
> So I suggest to consider architecture similar to Linux FUTEX.
>
> http://www.akkadia.org/drepper/futex.pdf
>
> and for mutex implementation use this. May it be, add even
> in the mutex structure field for identifier/RTEMS object ID.
> At least for debug build, it would be great, if there is
> in TCB (or in the case of kernel/user separation) well known
> TLS variable which would hold pointer to the specific thread
> taken mutex chain.

For the optimized OpenMP support I use the Linux futex barrier 
implementation of libgomp and added two futex calls for RTEMS (see 
attached file of first e-mail). The performance is really good. For the 
mutex and semaphore objects, however, I don't use the futex approach of 
libgomp. Futexes have excellent properties for average case systems, 
e.g. they provide for example random fairness. RTEMS is supposed to be a 
real-time operating system. So, here random fairness is not enough, 
instead we need FIFO fairness.

> But management of this is not so easy
> if mutexes can be released in the different order than locked.

We should not allow this. Such lock order reversals are bad.

> So it is not simple single locked list. But mapping which
> mutexes are held by given thread is required even for priority
> inheritance. This structure has to be kept in user manipulated
> data only if we do not want overhead of the syscall in the future.
> But all that is manageable and has been solved for FUTEX base
> OS API.

Yes, the current priority inheritance implementation needs to get 
improved to better support resource nesting. One problem we had in 
applications recently occurred in the file system. A file system 
instance like JFFS2 uses a mutex to protect the instance. With this lock 
held it uses malloc() and performs device operations which may also 
acquire a mutex. In case a high priority task uses malloc() then this 
could raise the priority of a task accessing the JFFS2 for a very long 
time. Since after the malloc() the priority is not immediately restored 
(resource count not zero).

>
> So generally, I would be very happy if RTEMS is faster
> but I hope that there would be found solution viable
> for long term and supporting reasonably broad/broad enough
> usecases scenarios. I have not all in my head and I think
> that this is for more iterations in the discussion.

For the network stack, OpenMP and SMP in general its not a question of 
faster. Its a question of by far too slow or good enough. We should 
decide if we want to use self-contained objects for the Newlib internal 
locks and the C11/C++11 thread support in GCC.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.