Self-contained one purpose objects

Thu Jul 23 10:40:37 UTC 2015

Hello Sebastian,

the first big thanks for RTEMS architectural updates.

On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:
> The Classic RTEMS and POSIX APIs have at least three weaknesses.
>
> * Dynamic memory (the workspace) is used to allocate object pools. This
>    requires a complex configuration with heavy use of the C pre-processor.
>
> * Objects are created via function calls which return an object identifier.
>    The object operations use this identifier and internally map it to an
>    internal object representation.
>
> * The object operations use a rich set of options and attributes. Each time
>    these parameters must be evaluated and validated to figure out what
> to do.
...
> In the long run this could lead to a very small footprint system without
> dependencies on dynamic memory and a purely static initialization.
>
> What are your opinions?

I fully understand your motivation and for small footprint system
the direct pointers use is most efficient option.
But in the area of smallest footprint systems there are many
alternatives to RTEMS - MBED, Nuttx etc.

RTEMS layering (Score, APIs, object identifiers etc) is quite
complex and has considerable overhead. On the other hand
these layers have added value that RTEMS has options to be
used in more complex scenarios. These fact to consider

 * if all locking construction use identifiers then they
   are well traceable. There is some problem with pthreads
   there that pthread_create does not have parameter
   for thread identifier/purpose.

 * use of identifiers and calling system operations with these
   identifiers allows to keep applications API even for case
   when operating system and applications runs in the separate
   domains/CPU privilege/ring. Pasing of pointers is really problematic
   in such use cases. RTEMS does not use memory spaces separation
   and it is questionable if MMU contex switches overhead is appropriate
   for some system use. But on the other hand there can be interesting
   uses where RTEMS is ported to microkernel and multiple RTEMS
   instances are run in address space separated domains even with
   strict temporal separation (POK, SpikeOS). These options
   has not been used in RTEMS too much yet. But there is one specific
   and unique area for RTEMS and it is support for asymmetric,
   heterogeneous multiprocessing. I am not sure how much is this
   feature in actual use today. But RTEMS is unique in this respect
   and use of task optimised CPU cores with different architectures
   would play important role in the future computing - see all todays
   GPUs, APUs and FPGA projects.

So my suggestion is to take all these use cases into consideration.
It should not be taken as the hard requirement, if really means
unacceptable overhead for common use cases but should be considered.

There is probably no problem to say that there is another synchronisation
primitive (MUTEX) to  RTEMS standard semaphores (even binary with inheritance
which are used as mutexes now). I even suggested to define MUTEX name
as alias for these because for code correctness checking it is much
easier to use check separate call for correct use (checking that each
lock is followed by unlock in the same thread patch) then to limit
this check to semaphores which has been initialised with specific parameters.
So introduction of mutex primitive helps.

As for implementation, I expect that maximal optimisation is required
for lock path without contention. This path should be optimised to
be inline or simple function call inside application context.
It is question if there should be considered even mutexes which
can be used in heterogeneous setups. If such type is found then there
would be need to call "system" or other more complex function.
But I think that these uses can be left for classic semaphores.
Classic mutexes are usually considered as mechanism used inside
threaded application/subsystem and not to spread from single
address sapace.

My feeling is that locking case with contention/wait should be
implemented the way that it allows future privilege separation
of scheduler/system core from applications as well as memory
context separation or use of hypervisors calls for wait.

So I suggest to consider architecture similar to Linux FUTEX.

http://www.akkadia.org/drepper/futex.pdf

and for mutex implementation use this. May it be, add even
in the mutex structure field for identifier/RTEMS object ID.
At least for debug build, it would be great, if there is
in TCB (or in the case of kernel/user separation) well known
TLS variable which would hold pointer to the specific thread
taken mutex chain. But management of this is not so easy
if mutexes can be released in the different order than locked.
So it is not simple single locked list. But mapping which
mutexes are held by given thread is required even for priority
inheritance. This structure has to be kept in user manipulated
data only if we do not want overhead of the syscall in the future.
But all that is manageable and has been solved for FUTEX base
OS API.

So generally, I would be very happy if RTEMS is faster
but I hope that there would be found solution viable
for long term and supporting reasonably broad/broad enough
usecases scenarios. I have not all in my head and I think
that this is for more iterations in the discussion.

Best wishes,

               Pavel