[rtems-docs commit] c-user: Add SMP low-level synchronization

Sebastian Huber sebh at rtems.org
Thu Feb 2 13:08:27 UTC 2017


Module:    rtems-docs
Branch:    master
Commit:    785c02f7f8fc61e2137679438bfe7abe090ae519
Changeset: http://git.rtems.org/rtems-docs/commit/?id=785c02f7f8fc61e2137679438bfe7abe090ae519

Author:    Sebastian Huber <sebastian.huber at embedded-brains.de>
Date:      Thu Feb  2 14:07:53 2017 +0100

c-user: Add SMP low-level synchronization

---

 c-user/glossary.rst                           |  12 +++
 c-user/symmetric_multiprocessing_services.rst | 122 +++++++++++++++++---------
 images/c_user/smplock01fair-t4240.pdf         | Bin 0 -> 33624 bytes
 images/c_user/smplock01fair-t4240.png         | Bin 0 -> 39680 bytes
 images/c_user/smplock01perf-t4240.pdf         | Bin 0 -> 34148 bytes
 images/c_user/smplock01perf-t4240.png         | Bin 0 -> 35093 bytes
 6 files changed, 93 insertions(+), 41 deletions(-)

diff --git a/c-user/glossary.rst b/c-user/glossary.rst
index f3e784e..aab9601 100644
--- a/c-user/glossary.rst
+++ b/c-user/glossary.rst
@@ -14,6 +14,9 @@ Glossary
       A task which must execute only at irregular intervals and has only a soft
       deadline.
 
+   API
+      An acronym for Application Programming Interface.
+
    application
       In this document, software which makes use of RTEMS.
 
@@ -314,6 +317,9 @@ Glossary
       A group of related RTEMS' directives which provide access and control
       over resources.
 
+   MCS
+      An acronym for Mellor-Crummey Scott.
+
    memory pool
       Used interchangeably with heap.
 
@@ -379,6 +385,9 @@ Glossary
    non-existent
       The state occupied by an uncreated or deleted task.
 
+   NUMA
+      An acronym for Non-Uniform Memory Access.
+
    numeric coprocessor
       A component used in computer systems to enhance performance in
       mathematically intensive situations.  It is typically viewed as a logical
@@ -614,6 +623,9 @@ Glossary
    SMCB
       An acronym for Semaphore Control Block.
 
+   SMP
+      An acronym for Symmetric Multiprocessing.
+
    SMP locks
       The SMP locks ensure mutual exclusion on the lowest level and are a
       replacement for the sections of disabled interrupts.  Interrupts are
diff --git a/c-user/symmetric_multiprocessing_services.rst b/c-user/symmetric_multiprocessing_services.rst
index 6d39944..4baf244 100644
--- a/c-user/symmetric_multiprocessing_services.rst
+++ b/c-user/symmetric_multiprocessing_services.rst
@@ -524,47 +524,6 @@ on a suitable platform, e.g. QorIQ T4240.  High-performance SMP applications
 need full control of the object storage :cite:`Drepper:2007:Memory`.
 Therefore, self-contained synchronization objects are now available for RTEMS.
 
-Implementation Details
-======================
-
-Thread Dispatch Details
------------------------
-
-This section gives background information to developers interested in the
-interrupt latencies introduced by thread dispatching.  A thread dispatch
-consists of all work which must be done to stop the currently executing thread
-on a processor and hand over this processor to an heir thread.
-
-In SMP systems, scheduling decisions on one processor must be propagated
-to other processors through inter-processor interrupts.  A thread dispatch
-which must be carried out on another processor does not happen instantaneously.
-Thus, several thread dispatch requests might be in the air and it is possible
-that some of them may be out of date before the corresponding processor has
-time to deal with them.  The thread dispatch mechanism uses three per-processor
-variables,
-
-- the executing thread,
-
-- the heir thread, and
-
-- a boolean flag indicating if a thread dispatch is necessary or not.
-
-Updates of the heir thread are done via a normal store operation.  The thread
-dispatch necessary indicator of another processor is set as a side-effect of an
-inter-processor interrupt.  So, this change notification works without the use
-of locks.  The thread context is protected by a TTAS lock embedded in the
-context to ensure that it is used on at most one processor at a time.
-Normally, only thread-specific or per-processor locks are used during a thread
-dispatch.  This implementation turned out to be quite efficient and no lock
-contention was observed in the testsuite.  The heavy-weight thread dispatch
-sequence is only entered in case the thread dispatch indicator is set.
-
-The context-switch is performed with interrupts enabled.  During the transition
-from the executing to the heir thread neither the stack of the executing nor
-the heir thread must be used during interrupt processing.  For this purpose a
-temporary per-processor stack is set up which may be used by the interrupt
-prologue before the stack is switched to the interrupt stack.
-
 Directives
 ==========
 
@@ -633,3 +592,84 @@ DESCRIPTION:
 
 NOTES:
     None.
+
+Implementation Details
+======================
+
+This section covers some implementation details of the RTEMS SMP support.
+
+Low-Level Synchronization
+-------------------------
+
+All low-level synchronization primitives are implemented using :term:`C11`
+atomic operations, so no target-specific hand-written assembler code is
+necessary.  Four synchronization primitives are currently available
+
+* ticket locks (mutual exclusion),
+
+* :term:`MCS` locks (mutual exclusion),
+
+* barriers, implemented as a sense barrier, and
+
+* sequence locks :cite:`Boehm:2012:Seqlock`.
+
+A vital requirement for low-level mutual exclusion is :term:`FIFO` fairness
+since we are interested in a predictable system and not maximum throughput.
+With this requirement, there are only few options to resolve this problem.  For
+reasons of simplicity, the ticket lock algorithm was chosen to implement the
+SMP locks.  However, the API is capable to support MCS locks, which may be
+interesting in the future for systems with a processor count in the range of 32
+or more, e.g.  :term:`NUMA`, many-core systems.
+
+The test program `SMPLOCK 1
+<https://git.rtems.org/rtems/tree/testsuites/smptests/smplock01>`_ can be used
+to gather performance and fairness data for several scenarios.  The SMP lock
+performance and fairness measured on the QorIQ T4240 follows as an example.
+This chip contains three L2 caches.  Each L2 cache is shared by eight
+processors.
+
+.. image:: ../images/c_user/smplock01perf-t4240.*
+   :width: 400
+   :align: center
+
+.. image:: ../images/c_user/smplock01fair-t4240.*
+   :width: 400
+   :align: center
+
+Thread Dispatch Details
+-----------------------
+
+This section gives background information to developers interested in the
+interrupt latencies introduced by thread dispatching.  A thread dispatch
+consists of all work which must be done to stop the currently executing thread
+on a processor and hand over this processor to an heir thread.
+
+In SMP systems, scheduling decisions on one processor must be propagated
+to other processors through inter-processor interrupts.  A thread dispatch
+which must be carried out on another processor does not happen instantaneously.
+Thus, several thread dispatch requests might be in the air and it is possible
+that some of them may be out of date before the corresponding processor has
+time to deal with them.  The thread dispatch mechanism uses three per-processor
+variables,
+
+- the executing thread,
+
+- the heir thread, and
+
+- a boolean flag indicating if a thread dispatch is necessary or not.
+
+Updates of the heir thread are done via a normal store operation.  The thread
+dispatch necessary indicator of another processor is set as a side-effect of an
+inter-processor interrupt.  So, this change notification works without the use
+of locks.  The thread context is protected by a TTAS lock embedded in the
+context to ensure that it is used on at most one processor at a time.
+Normally, only thread-specific or per-processor locks are used during a thread
+dispatch.  This implementation turned out to be quite efficient and no lock
+contention was observed in the testsuite.  The heavy-weight thread dispatch
+sequence is only entered in case the thread dispatch indicator is set.
+
+The context-switch is performed with interrupts enabled.  During the transition
+from the executing to the heir thread neither the stack of the executing nor
+the heir thread must be used during interrupt processing.  For this purpose a
+temporary per-processor stack is set up which may be used by the interrupt
+prologue before the stack is switched to the interrupt stack.
diff --git a/images/c_user/smplock01fair-t4240.pdf b/images/c_user/smplock01fair-t4240.pdf
new file mode 100644
index 0000000..f7d1b2e
Binary files /dev/null and b/images/c_user/smplock01fair-t4240.pdf differ
diff --git a/images/c_user/smplock01fair-t4240.png b/images/c_user/smplock01fair-t4240.png
new file mode 100644
index 0000000..ce36e84
Binary files /dev/null and b/images/c_user/smplock01fair-t4240.png differ
diff --git a/images/c_user/smplock01perf-t4240.pdf b/images/c_user/smplock01perf-t4240.pdf
new file mode 100644
index 0000000..b3eee7a
Binary files /dev/null and b/images/c_user/smplock01perf-t4240.pdf differ
diff --git a/images/c_user/smplock01perf-t4240.png b/images/c_user/smplock01perf-t4240.png
new file mode 100644
index 0000000..219eba7
Binary files /dev/null and b/images/c_user/smplock01perf-t4240.png differ



More information about the vc mailing list