C11 Re: [PATCH 3/6] termios: Use C11 mutex for input/output
Sebastian Huber
sebastian.huber at embedded-brains.de
Thu Dec 15 07:02:07 UTC 2016
On 14/12/16 22:15, Chris Johns wrote:
> On 15/12/2016 00:39, Sebastian Huber wrote:
>> Use C11 mutexes instead of Classic semaphores as a performance
>> optimization and to simplify the application configuration.
>
> The use of C11 mutexes has not been agreed too and we need to discuss
> this in more detail before we allow use within RTEMS. I would like to
> see positive agreement from all core maintainers before this and
> similar patches can be merged.
A patch is a good thing to start such a discussion.
>
> RTEMS has required the use of the Classic API because:
>
> 1. Available on all architectures, BSPs and tool sets.
> 2. Always present in a build.
> 3. Was considered faster than POSIX.
3. is not the case. From an API point of view the POSIX operations could
be faster than the Classic API since the parameter evaluation is simpler.
>
> The Classic API provides a base level of required functionality
> because it is always available in supported tool sets and leads to the
> smallest footprint because we do not need to link in more than one API.
Compared to self-contained objects (like the C11 mutexes for example)
the overhead of the Classic objects is huge in terms of run-time, memory
footprint, code size (object administration) and complexity (object
administration, use of a heap, unlimited objects, configuration).
>
> I understand things change and move on so it is great to see this
> change being proposed and our existing base line being challenged.
>
> I see from your performance figures C11 mutexes are better and the
> resources are allocated as needed and used which is a better model
> than the Classic API's configuration table. This is nice.
>
> Do all architectures and BSPs have working C11 support?
Yes, all architectures and BSPs support the C11 <threads.h> mutexes,
condition variables, thread-specific storage (mapped to POSIX keys),
once support (mapped to POSIX once) in all configurations. The C11
threads are mapped to POSIX threads (for simplicity, not a hard
requirement).
>
> Is there tests in the RTEMS testsuite for C11 threading services?
https://git.rtems.org/rtems/tree/testsuites/sptests/spstdthreads01/init.c
>
> What target resources are used to support this API, ie code and RAM
> usage?
On a 32-bit target:
(gdb) p sizeof(Semaphore_Control)
$1 = 72
(gdb) p sizeof(mtx_t)
$2 = 20
With Thumb-2 instruction set:
size ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-*.o
text data bss dec hex filename
704 0 0 704 2c0
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
536 0 0 536 218
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
4 0 0 4 4
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-call_once.o
100 0 0 100 64
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-cnd.o
104 0 0 104 68
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-mtx.o
156 0 0 156 9c
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-thrd.o
40 0 0 40 28
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-tss.o
size ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem*
text data bss dec hex filename
496 0 0 496 1f0
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semcreate.o
152 0 0 152 98
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semdelete.o
68 0 0 68 44
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semflush.o
28 0 0 28 1c
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semident.o
48 0 0 48 30
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem.o
428 0 0 428 1ac
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semobtain.o
464 0 0 464 1d0
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semrelease.o
312 0 0 312 138
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semsetpriority.o
The libscore_a-mutex.o contains more than one function. For example we
have (Cortex-M7 target):
7000c5f0 <_Mutex_recursive_Acquire>:
7000c5f0: 2380 movs r3, #128 ; 0x80
7000c5f2: f3ef 8111 mrs r1, BASEPRI
7000c5f6: f383 8812 msr BASEPRI_MAX, r3
7000c5fa: 4a12 ldr r2, [pc, #72] ; (7000c644
<_Mutex_recursive_Acquire+0x54>)
7000c5fc: 68c3 ldr r3, [r0, #12]
7000c5fe: 6912 ldr r2, [r2, #16]
7000c600: b91b cbnz r3, 7000c60a
<_Mutex_recursive_Acquire+0x1a>
7000c602: 60c2 str r2, [r0, #12]
7000c604: f381 8811 msr BASEPRI, r1
7000c608: 4770 bx lr
Only the above 10 instructions need to be executed in case the mutex is
available. Below is the part that is executed in case the thread needs
to block.
7000c60a: 4293 cmp r3, r2
7000c60c: d014 beq.n 7000c638
<_Mutex_recursive_Acquire+0x48>
7000c60e: 3008 adds r0, #8
7000c610: b5f0 push {r4, r5, r6, r7, lr}
7000c612: b08d sub sp, #52 ; 0x34
7000c614: 2700 movs r7, #0
7000c616: f04f 7600 mov.w r6, #33554432 ; 0x2000000
7000c61a: 4d0b ldr r5, [pc, #44] ; (7000c648
<_Mutex_recursive_Acquire+0x58>)
7000c61c: ab0c add r3, sp, #48 ; 0x30
7000c61e: 4c0b ldr r4, [pc, #44] ; (7000c64c
<_Mutex_recursive_Acquire+0x5c>)
7000c620: f88d 700c strb.w r7, [sp, #12]
7000c624: f843 1d30 str.w r1, [r3, #-48]!
7000c628: 4909 ldr r1, [pc, #36] ; (7000c650
<_Mutex_recursive_Acquire+0x60>)
7000c62a: 9601 str r6, [sp, #4]
7000c62c: 9502 str r5, [sp, #8]
7000c62e: 940a str r4, [sp, #40] ; 0x28
7000c630: f7fd fb8e bl 70009d50 <_Thread_queue_Enqueue>
7000c634: b00d add sp, #52 ; 0x34
7000c636: bdf0 pop {r4, r5, r6, r7, pc}
7000c638: 6903 ldr r3, [r0, #16]
7000c63a: 3301 adds r3, #1
7000c63c: 6103 str r3, [r0, #16]
7000c63e: f381 8811 msr BASEPRI, r1
7000c642: 4770 bx lr
7000c644: 70016980 .word 0x70016980
7000c648: 70009d3d .word 0x70009d3d
7000c64c: 70009d49 .word 0x70009d49
7000c650: 70013c24 .word 0x70013c24
>
> Would the "tiny" footprint be smaller if all internal services
> including compiler thread support are made C11? Could this actually be
> done? Parts of POSIX has been creeping in over time so the position is
> a little confused at the moment. I am not sure about a bits and pieces
> approach, maybe a full switch is made.
Yes, the footprint would be smaller. If we provide self-contained
threads, then the footprint would be much smaller, e.g. no object
administration, no heap.
>
> Does C11 work on LLVM (I hear support is close)?
>
> Where is the C11 API implemented? Is the threading code outside the
> RTEMS source tree and what effect does that have on those looking to
> certify RTEMS?
The C11 support is not a compiler issue. The <threads.h> is a part of
the C standard library and for RTEMS this header file is provided by
Newlib:
https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/threads.h;h=9fb08b03d1eb20024c0d680a7924336ec7ea57bb;hb=HEAD
This header file is compatible to C89 (with the next Newlib release,
currently C99 due to use of inline in <sys/lock.h>). I imported several
parts of the FreeBSD <sys/cdefs.h> for this purpose.
The C11 <threads.h> provided functions are implemented in RTEMS:
https://git.rtems.org/rtems/tree/cpukit/libstdthreads
>
> Does a change like this require a coding standard update?
Currently
https://devel.rtems.org/wiki/Developer/Coding/Conventions
gives no advice to use specific API X or Y.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
More information about the devel
mailing list