Self-contained one purpose objects
Sebastian Huber
sebastian.huber at embedded-brains.de
Thu Jul 23 11:31:23 UTC 2015
Hello Pavel,
thanks for your comments.
On 23/07/15 12:40, Pavel Pisa wrote:
> Hello Sebastian,
>
> the first big thanks for RTEMS architectural updates.
>
> On Thursday 23 of July 2015 11:16:03 Sebastian Huber wrote:
>> The Classic RTEMS and POSIX APIs have at least three weaknesses.
>>
>> * Dynamic memory (the workspace) is used to allocate object pools. This
>> requires a complex configuration with heavy use of the C pre-processor.
>>
>> * Objects are created via function calls which return an object identifier.
>> The object operations use this identifier and internally map it to an
>> internal object representation.
>>
>> * The object operations use a rich set of options and attributes. Each time
>> these parameters must be evaluated and validated to figure out what
>> to do.
> ...
>> In the long run this could lead to a very small footprint system without
>> dependencies on dynamic memory and a purely static initialization.
>>
>> What are your opinions?
> I fully understand your motivation and for small footprint system
> the direct pointers use is most efficient option.
> But in the area of smallest footprint systems there are many
> alternatives to RTEMS - MBED, Nuttx etc.
my goal is to get it smaller compared to what we have now. I don't want
the smallest system on the market. This would be only a side-effect, the
main purpose of the self-contained objects is performance and an easier
configuration. I think the conditional compilation in <rtems/confdefs.h>
has reached a problematic complexity.
>
> RTEMS layering (Score, APIs, object identifiers etc) is quite
> complex and has considerable overhead. On the other hand
> these layers have added value that RTEMS has options to be
> used in more complex scenarios. These fact to consider
>
> * if all locking construction use identifiers then they
> are well traceable. There is some problem with pthreads
> there that pthread_create does not have parameter
> for thread identifier/purpose.
>
> * use of identifiers and calling system operations with these
> identifiers allows to keep applications API even for case
> when operating system and applications runs in the separate
> domains/CPU privilege/ring. Pasing of pointers is really problematic
> in such use cases. RTEMS does not use memory spaces separation
> and it is questionable if MMU contex switches overhead is appropriate
> for some system use. But on the other hand there can be interesting
> uses where RTEMS is ported to microkernel and multiple RTEMS
> instances are run in address space separated domains even with
> strict temporal separation (POK, SpikeOS). These options
> has not been used in RTEMS too much yet. But there is one specific
> and unique area for RTEMS and it is support for asymmetric,
> heterogeneous multiprocessing. I am not sure how much is this
> feature in actual use today. But RTEMS is unique in this respect
> and use of task optimised CPU cores with different architectures
> would play important role in the future computing - see all todays
> GPUs, APUs and FPGA projects.
>
> So my suggestion is to take all these use cases into consideration.
> It should not be taken as the hard requirement, if really means
> unacceptable overhead for common use cases but should be considered.
I don't want to change the existing APIs. This object identifier
infrastructure is fine, but it was designed for a specific purpose, e.g.
to enable a platform that supports asymmetric multiprocessing (the RTEMS
MPCI support). With SMP we see now its limitations. A complex SMP
application like the FreeBSD network stack uses hundreds of locks and
the protection area of the locks are quite small. The lock/unlock
sequence of an uncontested mutex is absolutely performance critical.
[...]
>
> As for implementation, I expect that maximal optimisation is required
> for lock path without contention. This path should be optimised to
> be inline or simple function call inside application context.
> It is question if there should be considered even mutexes which
> can be used in heterogeneous setups. If such type is found then there
> would be need to call "system" or other more complex function.
> But I think that these uses can be left for classic semaphores.
> Classic mutexes are usually considered as mechanism used inside
> threaded application/subsystem and not to spread from single
> address sapace.
Yes.
>
> My feeling is that locking case with contention/wait should be
> implemented the way that it allows future privilege separation
> of scheduler/system core from applications as well as memory
> context separation or use of hypervisors calls for wait.
>
> So I suggest to consider architecture similar to Linux FUTEX.
>
> http://www.akkadia.org/drepper/futex.pdf
>
> and for mutex implementation use this. May it be, add even
> in the mutex structure field for identifier/RTEMS object ID.
> At least for debug build, it would be great, if there is
> in TCB (or in the case of kernel/user separation) well known
> TLS variable which would hold pointer to the specific thread
> taken mutex chain.
For the optimized OpenMP support I use the Linux futex barrier
implementation of libgomp and added two futex calls for RTEMS (see
attached file of first e-mail). The performance is really good. For the
mutex and semaphore objects, however, I don't use the futex approach of
libgomp. Futexes have excellent properties for average case systems,
e.g. they provide for example random fairness. RTEMS is supposed to be a
real-time operating system. So, here random fairness is not enough,
instead we need FIFO fairness.
> But management of this is not so easy
> if mutexes can be released in the different order than locked.
We should not allow this. Such lock order reversals are bad.
> So it is not simple single locked list. But mapping which
> mutexes are held by given thread is required even for priority
> inheritance. This structure has to be kept in user manipulated
> data only if we do not want overhead of the syscall in the future.
> But all that is manageable and has been solved for FUTEX base
> OS API.
Yes, the current priority inheritance implementation needs to get
improved to better support resource nesting. One problem we had in
applications recently occurred in the file system. A file system
instance like JFFS2 uses a mutex to protect the instance. With this lock
held it uses malloc() and performs device operations which may also
acquire a mutex. In case a high priority task uses malloc() then this
could raise the priority of a task accessing the JFFS2 for a very long
time. Since after the malloc() the priority is not immediately restored
(resource count not zero).
>
> So generally, I would be very happy if RTEMS is faster
> but I hope that there would be found solution viable
> for long term and supporting reasonably broad/broad enough
> usecases scenarios. I have not all in my head and I think
> that this is for more iterations in the discussion.
For the network stack, OpenMP and SMP in general its not a question of
faster. Its a question of by far too slow or good enough. We should
decide if we want to use self-contained objects for the Newlib internal
locks and the C11/C++11 thread support in GCC.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
More information about the devel
mailing list