C11 Re: [PATCH 3/6] termios: Use C11 mutex for input/output

Chris Johns chrisj at rtems.org
Thu Dec 15 22:34:14 UTC 2016


On 15/12/2016 18:02, Sebastian Huber wrote:
> On 14/12/16 22:15, Chris Johns wrote:
>> On 15/12/2016 00:39, Sebastian Huber wrote:
>>> Use C11 mutexes instead of Classic semaphores as a performance
>>> optimization and to simplify the application configuration.
>>
>> The use of C11 mutexes has not been agreed too and we need to discuss
>> this in more detail before we allow use within RTEMS. I would like to
>> see positive agreement from all core maintainers before this and
>> similar patches can be merged.
>
> A patch is a good thing to start such a discussion.
>

Great.

>>
>> RTEMS has required the use of the Classic API because:
>>
>>  1. Available on all architectures, BSPs and tool sets.
>>  2. Always present in a build.
>>  3. Was considered faster than POSIX.
>
> 3. is not the case. From an API point of view the POSIX operations could
> be faster than the Classic API since the parameter evaluation is simpler.
>

Yes, things have moved on and those crusty old developers like me have a 
soft spot for the classic API and I suspect these days it is little 
distorted view. :)

>>
>> The Classic API provides a base level of required functionality
>> because it is always available in supported tool sets and leads to the
>> smallest footprint because we do not need to link in more than one API.
>
> Compared to self-contained objects (like the C11 mutexes for example)
> the overhead of the Classic objects is huge in terms of run-time, memory
> footprint, code size (object administration) and complexity (object
> administration, use of a heap, unlimited objects, configuration).

I agree. The self contained is very attractive and a really big feature.

>
>>
>> I understand things change and move on so it is great to see this
>> change being proposed and our existing base line being challenged.
>>
>> I see from your performance figures C11 mutexes are better and the
>> resources are allocated as needed and used which is a better model
>> than the Classic API's configuration table. This is nice.
>>
>> Do all architectures and BSPs have working C11 support?
>
> Yes, all architectures and BSPs support the C11 <threads.h> mutexes,
> condition variables, thread-specific storage (mapped to POSIX keys),
> once support (mapped to POSIX once) in all configurations. The C11
> threads are mapped to POSIX threads (for simplicity, not a hard
> requirement).

Thank you and well done for all your efforts in this area. This is a 
really excellent place to be.

>
>>
>> Is there tests in the RTEMS testsuite for C11 threading services?
>
> https://git.rtems.org/rtems/tree/testsuites/sptests/spstdthreads01/init.c
>

Nice.

>>
>> What target resources are used to support this API, ie code and RAM
>> usage?
>
> On a 32-bit target:
>
> (gdb) p sizeof(Semaphore_Control)
> $1 = 72
> (gdb) p sizeof(mtx_t)
> $2 = 20
>
> With Thumb-2 instruction set:
>
> size ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
> ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-*.o
>     text    data     bss     dec     hex filename
>      704       0       0     704     2c0
> ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-mutex.o
>      536       0       0     536     218
> ./arm-rtems4.12/c/atsamv/cpukit/score/src/libscore_a-condition.o
>        4       0       0       4       4
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-call_once.o
>      100       0       0     100      64
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-cnd.o
>      104       0       0     104      68
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-mtx.o
>      156       0       0     156      9c
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-thrd.o
>       40       0       0      40      28
> ./arm-rtems4.12/c/atsamv/cpukit/libstdthreads/libstdthreads_a-tss.o
>
> size ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem*
>     text    data     bss     dec     hex filename
>      496       0       0     496     1f0
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semcreate.o
>      152       0       0     152      98
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semdelete.o
>       68       0       0      68      44
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semflush.o
>       28       0       0      28      1c
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semident.o
>       48       0       0      48      30
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-sem.o
>      428       0       0     428     1ac
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semobtain.o
>      464       0       0     464     1d0
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semrelease.o
>      312       0       0     312     138
> ./arm-rtems4.12/c/atsamv/cpukit/rtems/src/librtems_a-semsetpriority.o
>

Nice.

> The libscore_a-mutex.o contains more than one function. For example we
> have (Cortex-M7 target):
>
> 7000c5f0 <_Mutex_recursive_Acquire>:
> 7000c5f0:       2380            movs    r3, #128        ; 0x80
> 7000c5f2:       f3ef 8111       mrs     r1, BASEPRI
> 7000c5f6:       f383 8812       msr     BASEPRI_MAX, r3
> 7000c5fa:       4a12            ldr     r2, [pc, #72]   ; (7000c644
> <_Mutex_recursive_Acquire+0x54>)
> 7000c5fc:       68c3            ldr     r3, [r0, #12]
> 7000c5fe:       6912            ldr     r2, [r2, #16]
> 7000c600:       b91b            cbnz    r3, 7000c60a
> <_Mutex_recursive_Acquire+0x1a>
> 7000c602:       60c2            str     r2, [r0, #12]
> 7000c604:       f381 8811       msr     BASEPRI, r1
> 7000c608:       4770            bx      lr
>
> Only the above 10 instructions need to be executed in case the mutex is
> available. Below is the part that is executed in case the thread needs
> to block.
>
> 7000c60a:       4293            cmp     r3, r2
> 7000c60c:       d014            beq.n   7000c638
> <_Mutex_recursive_Acquire+0x48>
> 7000c60e:       3008            adds    r0, #8
> 7000c610:       b5f0            push    {r4, r5, r6, r7, lr}
> 7000c612:       b08d            sub     sp, #52 ; 0x34
> 7000c614:       2700            movs    r7, #0
> 7000c616:       f04f 7600       mov.w   r6, #33554432   ; 0x2000000
> 7000c61a:       4d0b            ldr     r5, [pc, #44]   ; (7000c648
> <_Mutex_recursive_Acquire+0x58>)
> 7000c61c:       ab0c            add     r3, sp, #48     ; 0x30
> 7000c61e:       4c0b            ldr     r4, [pc, #44]   ; (7000c64c
> <_Mutex_recursive_Acquire+0x5c>)
> 7000c620:       f88d 700c       strb.w  r7, [sp, #12]
> 7000c624:       f843 1d30       str.w   r1, [r3, #-48]!
> 7000c628:       4909            ldr     r1, [pc, #36]   ; (7000c650
> <_Mutex_recursive_Acquire+0x60>)
> 7000c62a:       9601            str     r6, [sp, #4]
> 7000c62c:       9502            str     r5, [sp, #8]
> 7000c62e:       940a            str     r4, [sp, #40]   ; 0x28
> 7000c630:       f7fd fb8e       bl      70009d50 <_Thread_queue_Enqueue>
> 7000c634:       b00d            add     sp, #52 ; 0x34
> 7000c636:       bdf0            pop     {r4, r5, r6, r7, pc}
> 7000c638:       6903            ldr     r3, [r0, #16]
> 7000c63a:       3301            adds    r3, #1
> 7000c63c:       6103            str     r3, [r0, #16]
> 7000c63e:       f381 8811       msr     BASEPRI, r1
> 7000c642:       4770            bx      lr
> 7000c644:       70016980        .word   0x70016980
> 7000c648:       70009d3d        .word   0x70009d3d
> 7000c64c:       70009d49        .word   0x70009d49
> 7000c650:       70013c24        .word   0x70013c24
>

Nice.

>>
>> Would the "tiny" footprint be smaller if all internal services
>> including compiler thread support are made C11? Could this actually be
>> done? Parts of POSIX has been creeping in over time so the position is
>> a little confused at the moment. I am not sure about a bits and pieces
>> approach, maybe a full switch is made.
>
> Yes, the footprint would be smaller. If we provide self-contained
> threads, then the footprint would be much smaller, e.g. no object
> administration, no heap.

Great. This is a powerful reason to look at moving in this direction and 
removing the remaining POSIX usage in libstdthreads.

A brief audit of rtems.git shows the change is possible with less than 
30 Classic task creates and a similar number of semaphore creates so a 
full change look reachable which is nice.

Should we look at moving all internal services to C11 and standardise 
it? I think there is value in doing this. It can be a post 4.12 branch 
activity.

>
>>
>> Does C11 work on LLVM (I hear support is close)?
>>
>> Where is the C11 API implemented? Is the threading code outside the
>> RTEMS source tree and what effect does that have on those looking to
>> certify RTEMS?
>
>
> The C11 support is not a compiler issue. The <threads.h> is a part of
> the C standard library and for RTEMS this header file is provided by
> Newlib:
>
> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/threads.h;h=9fb08b03d1eb20024c0d680a7924336ec7ea57bb;hb=HEAD
>
>
> This header file is compatible to C89 (with the next Newlib release,
> currently C99 due to use of inline in <sys/lock.h>). I imported several
> parts of the FreeBSD <sys/cdefs.h> for this purpose.
>
> The C11 <threads.h> provided functions are implemented in RTEMS:
>
> https://git.rtems.org/rtems/tree/cpukit/libstdthreads
>

Thanks.

>>
>> Does a change like this require a coding standard update?
>
> Currently
>
> https://devel.rtems.org/wiki/Developer/Coding/Conventions
>
> gives no advice to use specific API X or Y.
>

Yes, I knew the answer to this one. :)

Thank you for the detailed and excellent review and analysis of the C11 
support. I have no problem with the change and C11 being used internally.

Chris



More information about the devel mailing list