C11 Re: [PATCH 3/6] termios: Use C11 mutex for input/output

Thu Dec 15 07:02:07 UTC 2016

On 14/12/16 22:15, Chris Johns wrote:
> On 15/12/2016 00:39, Sebastian Huber wrote:
>> Use C11 mutexes instead of Classic semaphores as a performance
>> optimization and to simplify the application configuration.
>
> The use of C11 mutexes has not been agreed too and we need to discuss 
> this in more detail before we allow use within RTEMS. I would like to 
> see positive agreement from all core maintainers before this and 
> similar patches can be merged.

A patch is a good thing to start such a discussion.

>
> RTEMS has required the use of the Classic API because:
>
>  1. Available on all architectures, BSPs and tool sets.
>  2. Always present in a build.
>  3. Was considered faster than POSIX.

3. is not the case. From an API point of view the POSIX operations could 
be faster than the Classic API since the parameter evaluation is simpler.

>
> The Classic API provides a base level of required functionality 
> because it is always available in supported tool sets and leads to the 
> smallest footprint because we do not need to link in more than one API.

Compared to self-contained objects (like the C11 mutexes for example) 
the overhead of the Classic objects is huge in terms of run-time, memory 
footprint, code size (object administration) and complexity (object 
administration, use of a heap, unlimited objects, configuration).

>
> I understand things change and move on so it is great to see this 
> change being proposed and our existing base line being challenged.
>
> I see from your performance figures C11 mutexes are better and the 
> resources are allocated as needed and used which is a better model 
> than the Classic API's configuration table. This is nice.
>
> Do all architectures and BSPs have working C11 support?

Yes, all architectures and BSPs support the C11 <threads.h> mutexes, 
condition variables, thread-specific storage (mapped to POSIX keys), 
once support (mapped to POSIX once) in all configurations. The C11 
threads are mapped to POSIX threads (for simplicity, not a hard 
requirement).

>
> Is there tests in the RTEMS testsuite for C11 threading services?

https://git.rtems.org/rtems/tree/testsuites/sptests/spstdthreads01/init.c

>
> What target resources are used to support this API, ie code and RAM 
> usage?

On a 32-bit target:

(gdb) p sizeof(Semaphore_Control)
$1 = 72
(gdb) p sizeof(mtx_t)
$2 = 20

With Thumb-2 instruction set:

size ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o 
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-*.o
    text    data     bss     dec     hex filename
     704       0       0     704     2c0 
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
     536       0       0     536     218 
./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
       4       0       0       4       4 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-call_once.o
     100       0       0     100      64 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-cnd.o
     104       0       0     104      68 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-mtx.o
     156       0       0     156      9c 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-thrd.o
      40       0       0      40      28 
./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-tss.o

size ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem*
    text    data     bss     dec     hex filename
     496       0       0     496     1f0 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semcreate.o
     152       0       0     152      98 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semdelete.o
      68       0       0      68      44 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semflush.o
      28       0       0      28      1c 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semident.o
      48       0       0      48      30 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem.o
     428       0       0     428     1ac 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semobtain.o
     464       0       0     464     1d0 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semrelease.o
     312       0       0     312     138 
./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semsetpriority.o

The libscore_a-mutex.o contains more than one function. For example we 
have (Cortex-M7 target):

7000c5f0 <_Mutex_recursive_Acquire>:
7000c5f0:       2380            movs    r3, #128        ; 0x80
7000c5f2:       f3ef 8111       mrs     r1, BASEPRI
7000c5f6:       f383 8812       msr     BASEPRI_MAX, r3
7000c5fa:       4a12            ldr     r2, [pc, #72]   ; (7000c644 
<_Mutex_recursive_Acquire+0x54>)
7000c5fc:       68c3            ldr     r3, [r0, #12]
7000c5fe:       6912            ldr     r2, [r2, #16]
7000c600:       b91b            cbnz    r3, 7000c60a 
<_Mutex_recursive_Acquire+0x1a>
7000c602:       60c2            str     r2, [r0, #12]
7000c604:       f381 8811       msr     BASEPRI, r1
7000c608:       4770            bx      lr

Only the above 10 instructions need to be executed in case the mutex is 
available. Below is the part that is executed in case the thread needs 
to block.

7000c60a:       4293            cmp     r3, r2
7000c60c:       d014            beq.n   7000c638 
<_Mutex_recursive_Acquire+0x48>
7000c60e:       3008            adds    r0, #8
7000c610:       b5f0            push    {r4, r5, r6, r7, lr}
7000c612:       b08d            sub     sp, #52 ; 0x34
7000c614:       2700            movs    r7, #0
7000c616:       f04f 7600       mov.w   r6, #33554432   ; 0x2000000
7000c61a:       4d0b            ldr     r5, [pc, #44]   ; (7000c648 
<_Mutex_recursive_Acquire+0x58>)
7000c61c:       ab0c            add     r3, sp, #48     ; 0x30
7000c61e:       4c0b            ldr     r4, [pc, #44]   ; (7000c64c 
<_Mutex_recursive_Acquire+0x5c>)
7000c620:       f88d 700c       strb.w  r7, [sp, #12]
7000c624:       f843 1d30       str.w   r1, [r3, #-48]!
7000c628:       4909            ldr     r1, [pc, #36]   ; (7000c650 
<_Mutex_recursive_Acquire+0x60>)
7000c62a:       9601            str     r6, [sp, #4]
7000c62c:       9502            str     r5, [sp, #8]
7000c62e:       940a            str     r4, [sp, #40]   ; 0x28
7000c630:       f7fd fb8e       bl      70009d50 <_Thread_queue_Enqueue>
7000c634:       b00d            add     sp, #52 ; 0x34
7000c636:       bdf0            pop     {r4, r5, r6, r7, pc}
7000c638:       6903            ldr     r3, [r0, #16]
7000c63a:       3301            adds    r3, #1
7000c63c:       6103            str     r3, [r0, #16]
7000c63e:       f381 8811       msr     BASEPRI, r1
7000c642:       4770            bx      lr
7000c644:       70016980        .word   0x70016980
7000c648:       70009d3d        .word   0x70009d3d
7000c64c:       70009d49        .word   0x70009d49
7000c650:       70013c24        .word   0x70013c24

>
> Would the "tiny" footprint be smaller if all internal services 
> including compiler thread support are made C11? Could this actually be 
> done? Parts of POSIX has been creeping in over time so the position is 
> a little confused at the moment. I am not sure about a bits and pieces 
> approach, maybe a full switch is made.

Yes, the footprint would be smaller. If we provide self-contained 
threads, then the footprint would be much smaller, e.g. no object 
administration, no heap.

>
> Does C11 work on LLVM (I hear support is close)?
>
> Where is the C11 API implemented? Is the threading code outside the 
> RTEMS source tree and what effect does that have on those looking to 
> certify RTEMS?

The C11 support is not a compiler issue. The <threads.h> is a part of 
the C standard library and for RTEMS this header file is provided by 
Newlib:

https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/threads.h;h=9fb08b03d1eb20024c0d680a7924336ec7ea57bb;hb=HEAD 

This header file is compatible to C89 (with the next Newlib release, 
currently C99 due to use of inline in <sys/lock.h>). I imported several 
parts of the FreeBSD <sys/cdefs.h> for this purpose.

The C11 <threads.h> provided functions are implemented in RTEMS:

https://git.rtems.org/rtems/tree/cpukit/libstdthreads

>
> Does a change like this require a coding standard update?

Currently

https://devel.rtems.org/wiki/Developer/Coding/Conventions

gives no advice to use specific API X or Y.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.