[PATCH 2/4] score: SMP initialization and shutdown changes

Sebastian Huber sebastian.huber at embedded-brains.de
Thu Feb 27 09:09:48 UTC 2014


On 2014-02-27 09:16, Chris Johns wrote:
> On 26/02/2014 7:43 pm, Sebastian Huber wrote:
>> First some words to this patch set.  It is a gradual improvement of the
>> existing SMP low-level initialization and shutdown procedure.
>>
>> The existing solution was able to start an SMP system.  Only atomic
>> reads/writes were used.  It is impossible to implement a controlled
>> shutdown only with atomic reads/writes.
>
> I cannot comment as I have no considered it in detail. It is a big statement to
> make. Maybe this is a suitable GSoC project.
>
>> See also "The Art of
>> Multiprocessor Programming", chapter 5, "The Relative Power of Primitive
>> Synchronization Operations".
>
> I will take a look but it might be a while.
>
>> This patch introduces a statically initialized global SMP lock to manage
>> the per-CPU state changes (_Per_CPU_State_lock).  It changes also the
>> states and documents the state machine.
>>
>> It is now also possible to use fatal error handlers to control the
>> shutdown on the lowest-level.
>>
>> Without this patch on NGMP the situation is like this:
>>
>> ***  SMP03 TEST ***
>>    CPU 3 running task Init
>>    CPU 2 running task TA1
>>    CPU 1 running task TA2
>>    CPU 0 running task TA3
>>    CPU 0 running task TA4
>> *** END OF TEST SMP03 ***
>> CPU 0:  Unknown watchpoint hit
>>          0x0000b1c4: 30800000  ba,a  0x0000B1C4
>> <rtems_smp_process_interrupt+172>
>> CPU 1:  Unknown watchpoint hit
>>          0x0000b1c4: 30800000  ba,a  0x0000B1C4
>> <rtems_smp_process_interrupt+172>
>> CPU 2:  Unknown watchpoint hit
>>          0x0000b1c4: 30800000  ba,a  0x0000B1C4
>> <rtems_smp_process_interrupt+172>
>> CPU 3:  Program exited normally.
>>
>> Now I have this:
>>
>> ***  SMP03 TEST ***
>>    CPU 3 running task Init
>>    CPU 2 running task TA1
>>    CPU 1 running task TA2
>>    CPU 0 running task TA3
>>    CPU 0 running task TA4
>> *** END OF TEST SMP03 ***
>>
>>    CPU 0:  Program exited normally.
>>    CPU 1:  Power down mode
>>    CPU 2:  Power down mode
>>    CPU 3:  Power down mode
>>
>> I think we should try to discuss problems with a specific patch in one
>> thread and general problems in a separate thread in the future.
>
> Great. Please start the discussion. I am not doing the work or creating the
> patches and you are so it is up to you and in your interests to get this under
> way and discussed as soon as possible. Starting a discussion and general design
> and review process leaves this patch and any others in a difficult position to
> approve and it also effects your planned work. I am aware of this and sensitive
> to it however I do not agree with some parts of the presented design and I am
> concerned about the future parts that are known to be wrong and not covered in
> your current scope of work. We need an open discussion where the top level
> requirements need to be defined and the design path settled and without it I am
> limited to commenting in patches about general design issues.

The SMP initialization existed before my involvement in this area.  It is not 
my work and I only fix basic problems here.

I put a great deal of information into the wiki page

http://www.rtems.org/wiki/index.php?title=SMP

I also sent several e-mails to the list in the last couple of months will 
little response.

>
>>
>> On 2014-02-25 00:02, Chris Johns wrote:
>>> On 21/02/2014 6:48 pm, Sebastian Huber wrote:
>>>> On 2014-02-21 01:31, Chris Johns wrote:
>>>>> On 21/02/2014 3:31 am, Sebastian Huber wrote:
>>>>>> Hello Chris,
>>>>>>
>>>>>> On 2014-02-20 03:50, Chris Johns wrote:
>>>>>>> On 20/02/2014 12:42 am, Sebastian Huber wrote:
>>>>>>>> +/**
>>>>>>>> + * @brief State of a processor.
>>>>>>>> + *
>>>>>>>> + * @dot
>>>>>>>> + * digraph states {
>>>>>>>> + *   bi [label="PER_CPU_STATE_BEFORE_INITIALIZATION"];
>>>>>>>> + *   rsm [label="PER_CPU_STATE_READY_TO_START_MULTITASKING"];
>>>>>>>> + *   sm [label="PER_CPU_STATE_START_MULTITASKING"];
>>>>>>>> + *   ds [label="PER_CPU_STATE_DO_SHUTDOWN"];
>>>>>>>> + *   u [label="PER_CPU_STATE_UP"];
>>>>>>>> + *   s [label="PER_CPU_STATE_SHUTDOWN"];
>>>>>>>> + *   bi -> rsm [label="secondary processor\ncompleted
>>>>>>>> initialization"];
>>>>>>>> + *   bi -> u [label="main processor\nstarts multitasking"];
>>>>>>>> + *   rsm -> sm [label="main processor\ncompleted initialization"];
>>>>>>>> + *   rsm -> ds [label="a fatal error occurred"];
>>>>>>>> + *   ds -> s [label="do shutdown\nstate observed"];
>>>>>>>> + *   sm -> u [label="secondary processor\nstarts multitasking"];
>>>>>>>> + *   u -> s [label="shutdown initiated"];
>>>>>>>> + * }
>>>>>>>> + * @enddot
>>>>>>>
>>>>>>> I do not see sm to ds if the main core goes to s after going from bi
>>>>>>> to u ?
>>>>>>>
>>>>>>> I also see u to s and not to ds; why as the state is called "do
>>>>>>> shutdown" ?
>>>>>>> Should this state be PER_CPU_STATE_SHUTTING_DOWN ?
>>>>>>
>>>>>> I had to change the procedure considerably since I noticed unsolvable
>>>>>> problems with the previous approach based only on atomic read/write
>>>>>> operations. Attached is the new state diagram.
>>>>>>
>>>>>
>>>>> I do not follow this picture. I am assume there is one state machine
>>>>> per CPU
>>>>> and so I do not see where the state "starting multitasking" is. I see
>>>>> a state
>>>>> called "request start multitasking" then "up" and I am wondering what
>>>>> the state
>>>>> is after the request has happened and before up.
>>>>
>>>> The application defines a maximum count of processors (NP).  The
>>>> configuration will then define the _Per_CPU_Information[NP] table.  Each
>>>> table entry has one per-CPU state entry.
>>>
>>> Does the BSP define the max number of possible cores the hardware can
>>> support ?
>>
>> Yes, via the return value of _CPU_SMP_Initialize().
>
> This is what should be used to define the Per CPU area.
>
>> This patch changes nothing in this area.
>
> I have made my comments on this above.
>
>>
>>> I see later in this email you talk about AP being available processors
>>> and
>>> defined by the BSP. I think I am getting confused with the term
>>> "maximum count
>>> of processors". Does it mean the max count the application may require
>>> or the
>>> max count supported by the hardware ?
>>
>> It is a mix of both.  The application configures a desired maximum count
>> and the BSP may reduce it to match the actual hardware.
>
> This seems confused and wrong and why I feel a general lack of top level
> requirements is appearing in the design. We do not configure any other part of
> the configuration this way. We do not let applications set the memory size to
> 1T byte of RAM and the BSP lowers it to the actual amount nor do we have the
> application set the CPU clock to 1PetaHz and the BSP lower it to the hardware
> defined maximum rate. It seems counter-intuitive and goes against how
> everything else is configured.

We have this since day-one of RTEMS SMP.  It is not my invention.  It is good 
enough for me at the moment and I have more pressing problems.

>
>>
>>>
>>> There is another issue here I am struggling with and while
>>> specifically not on
>>> topic for this patch it relates to the configuration of processor
>>> count so
>>> please indulge me. The application having to define the number of
>>> cores means
>>> there is no default position and that means we have this ...
>>>
>>> http://git.rtems.org/rtems/tree/testsuites/smptests/smp01/system.h#n28
>>>
>>> which is based on this ...
>>>
>>> http://git.rtems.org/rtems/tree/cpukit/sapi/include/confdefs.h#n205
>>>
>>> A user builds RTEMS with --enable-smp and then has another gate in
>>> CONFIGURE_SMP_APPLICATION. This does not make sense. Why build for SMP
>>> and then
>>> never use it yet incur lots of overhead ? If I build RTEMS with SMP
>>> enabled for
>>> a BSP that supports SMP I would expect it to work with SMP and not
>>> default to
>>> some half way point. For example if you build the Zync A9 qemu BSP
>>> with SMP
>>> enabled and build all the tests then run them on a patched qemu with SMP
>>> support you happily see about 10 maybe 15 failures for over 480 tests
>>> however
>>> these result are meaningless because CONFIGURE_SMP_APPLICATION is only
>>> defined
>>> for a few SMP specific tests and incorrectly for the Zync because it
>>> only has 2
>>> cores and not 4. I feel we should not go to a release with the tests
>>> in this
>>> state even with clear documentation. If the tests fail with SMP we
>>> need to have
>>> the test results show this and document what the failures are so users
>>> are left
>>> with a clear picture of the state of SMP support. For example with the
>>> Zynq A9
>>> qemu example none of the block (libbd) tests should pass as libbd uses
>>> disable
>>> pre-emption and this should generate an error if used.
>>>
>>> The CONFIGURE_SMP_APPLICATION needs to go however there is no default
>>> core
>>> count defined by the BSP that can be used to allow this. IMO if the
>>> application
>>> does not define the core count it needs, ie less than the max
>>> supported by the
>>> hardware, the max possible should be used.
>>
>> I didn't invent this CONFIGURE_SMP_APPLICATION.
>
> I know. All I am trying to do is bring out into the open all the issues we have
> with SMP in RTEMS so the community knows the state and so we can look at the
> list of things that needs doing.
>
>> At the moment it is
>> useful since we lack important features like thread deletion.
>
> If the code is broken and tests fails we should have them fail.

I am not opposed to change this, but I will not do it.  I have project 
deadlines and I have to focus on things important for my work.

>
>> With the
>> introduction of clustered/partitioned scheduling the configuration will
>> completely change. So I would like to postpone this discussion a couple
>> of weeks.
>
> I cannot comment as I have no idea what this means. I did not even know
> "clustered/partitioned scheduling" was a requirement. We still cannot run a
> single real application under SMP (see the thread deletion comment above).

http://www.rtems.org/pipermail/rtems-devel/2013-November/004950.html
http://www.rtems.org/wiki/index.php?title=SMP#Clustered_Scheduling
http://www.rtems.org/wiki/index.php?title=SMP#Clustered_Scheduling_2

>
>>
>>>
>>>>  Since this table is in the BSS
>>>> section, all states start with PER_CPU_STATE_INITIAL.
>>>
>>> Who is clearing the BSS and how is this managed ? There seems to be some
>>> implicit requirements here. The requirement being "RTEMS requires all
>>> processors to run in an SMP environment must be started externally to
>>> RTEMS".
>>> Is this requirement what we really want ?
>>
>> This is not the case.  The low-level start-up is highly BSP specific.
>> The leon3 BSP for example starts secondary processors on its own.  The
>> QorIQ relies on U-Boot.  Our Altera Cyclone-V (ARM Cortex-A9 MPCore
>> based) will also start secondary processors on its own.
>
> Ok so we have a mix. I did not know this and when I raised this last year with
> you I was told it needs to be performed by the boot monitor. As a result I went
> down this path on the Zynq.

This is fine if you have a boot loader.  On the Cyclone-V the boot loader 
didn't start the second processor so we had to do it in the BSP.

>
>>
>>>
>>> On the ARM (well the zynq I have used) starting processors is easy and
>>> the code
>>> needed is tiny and could be made available in the BSP shared area,
>>> while I
>>> understand on the PPC it is complex. I suppose the question is adding
>>> this
>>> support to RTEMS what we want to have and worth it verses adding this
>>> code
>>> gives us complete control over initialisation and makes the boot
>>> monitor simpler.
>>>
>>> Given I currently only use the ARM which is easy and I will not ship
>>> uboot due
>>> to licensing reason I favour adding support to RTEMS. :)
>>>
>>>>
>>>> The boot processor calls boot_card() and all other processors do what
>>>> they want, but they must wait until the boot processor gives the go
>>>> (explained later).
>>>>
>>>> The boot processor calls eventually:
>>>>
>>>>    /**
>>>>     * @brief Performs CPU specific SMP initialization in the context of
>>>> the boot
>>>>     * processor.
>>>>     *
>>>>     * This function is invoked on the boot processor by RTEMS during
>>>>     * initialization.  All interrupt stacks are allocated at this point
>>>> in case
>>>>     * the CPU port allocates the interrupt stacks.
>>>>     *
>>>>     * The CPU port should start secondary processors now.
>>>
>>> What does this mean ? I am not sure if you mean the boot monitor does
>>> this or
>>> this is handled in RTEMS in the BSP or related CPU code.
>>>
>>>>     *
>>>>     * @param[in] configured_cpu_count The count of processors requested
>>>> by the
>>>>     * application configuration.
>>>>     *
>>>>     * @return The count of processors available for the application in
>>>> the system.
>>>>     * This value is less than or equal to the configured count of
>>>> processors.
>>>>     */
>>>>    uint32_t _CPU_SMP_Initialize( uint32_t configured_cpu_count );
>>>>
>>>> So here has the CPU port (or BSP in most cases) the chance to reduce the
>>>> configured count of processors to the actually available (AP).  We have
>>>> AP <= NP.  In case this function returns "you have three processors",
>>>> then these three processors MUST start properly or terminate the
>>>> system.  If some processors are not always available or otherwise
>>>> unreliable this must be dealt with in _CPU_SMP_Initialize().
>>>
>>> This is confusing me. The application controls the max number of cores
>>> (NP) a
>>> system can have and the BSP (or CPU port which knows the max that can
>>> exist)
>>> can configure that down to the actual number. Does this mean I could
>>> define 32
>>> cores at the application and Zynq BSP would say, sorry only 2 here so
>>> this is
>>> what you get ?
>>
>> Yes.  Its done in some tests, e.g.
>>
>> http://git.rtems.org/rtems/tree/testsuites/smptests/smplock01/init.c#n28
>>
>> Does this waste RAM ?
>>
>> Yes, but its not much, e.g. the _Per_CPU_Information table is a bit
>> bigger than necessary.
>>
>
> Lets agree to just stick to yes.
>
>>>
>>> What about an application defining 2 on a 4 core system ?
>>
>> It gets two.
>>
>
> Ok.
>
>>>
>>>>
>>>> System termination can happen at anytime on any processor.  So you can
>>>> change from every state into PER_CPU_STATE_SHUTDOWN.  This ability is
>>>> the core of this change set.  In the previous implementation this was
>>>> not guaranteed.
>>>>
>>>
>>> Did the state diagram show this ? I am sorry if I missed this.
>>
>> Yes, you have arrows from all other states to PER_CPU_STATE_SHUTDOWN.
>>
>>>
>>>> Now lets look at the normal start-up (no shutdown).  The next state
>>>> after PER_CPU_STATE_INITIAL is
>>>> PER_CPU_STATE_READY_TO_START_MULTITASKING.
>>>>
>>>>    /**
>>>>     * @brief Processor is ready to start multitasking.
>>>>     *
>>>>     * The secondary processor performed its basic initialization and is
>>>> ready to
>>>>     * receive inter-processor interrupts.  Interrupt delivery must be
>>>> disabled
>>>>     * in this state, but requested inter-processor interrupts must be
>>>> recorded
>>>>     * and must be delivered once the secondary processor enables
>>>> interrupts for
>>>>     * the first time.  The boot processor will wait for all secondary
>>>> processors
>>>>     * to change into this state.  In case a secondary processor does not
>>>> reach
>>>>     * this state the system will not start.  The secondary processors
>>>> wait now
>>>>     * for a change into the PER_CPU_STATE_REQUEST_START_MULTITASKING
>>>> state set
>>>>     * by the boot processor once all secondary processors reached the
>>>>     * PER_CPU_STATE_READY_TO_START_MULTITASKING state.
>>>>     */
>>>>    PER_CPU_STATE_READY_TO_START_MULTITASKING,
>>>>
>>>> The key point for a per-CPU state and not a global state is that every
>>>> processor must perform some initialization steps to set up the "I can
>>>> receive inter-processor interrupts (IPI)" state.  Before IPIs are
>>>> possible the only why are spin variables (e.g. this per-CPU state
>>>> variables).
>>>>
>>>> This PER_CPU_STATE_READY_TO_START_MULTITASKING is a synchronization
>>>> barrier.
>>>>
>>>> The boot processor will then set the state to
>>>> PER_CPU_STATE_REQUEST_START_MULTITASKING on all processors configured
>>>> and available (AP).  Once the secondary processor observes this state
>>>> change (it spins on its state variable), it will go into the
>>>> PER_CPU_STATE_UP state and perform a context switch to the first thread.
>>>>
>>>
>>> If this is about getting to a suitable IPI state on each core then why
>>> not have
>>> this in the states and functions rather then the global state type
>>> names of
>>> "start multitasking" etc.
>>
>> I don't think we should be too specific about the IPI here.  The basic
>> statement is that a processor is ready to start multitasking.
>>
>>>
>>> I see the issue of needing a clear path from no IPI to having IPI
>>> however I am
>>> still wondering why the need to have the synch point and all cores to
>>> be there
>>> before moving on. A new core in the system should be able to see the
>>> system
>>> state and IPIs are active and then enable its support then add itself
>>> to the
>>> system.
>>
>> Then you have to check each time which method you need to send a message
>> to another processor.  I don't see a benefit here.
>>
>
> It is just a check when booting and not used once using IPI.
>
>>>
>>> If there is a global state variable with suitable spinning locks and
>>> atomics
>>> that direct the per cpu paths taken I fail to see why we need this sync
>>> barrier. The first core to a given global state makes the transition
>>> and the
>>> other cores need to wait or move to a different state. This assumes
>>> the BSS
>>> init issue is resolved.
>>
>> I fail to see why this barrier is a problem.
>
> Because as I stated before this is a system design constraint and not something
> RTEMS should enforce. It could be optional but not mandatory and being optional
> then using it comes with some risks.
>
>> The processors in an SMP
>> system MUST work reliable if now or later it doesn't matter.
>
> When I used the existing system and the boot monitor did not start a processor
> the system locked up. There was no recovery with the current design and I
> consider this a flaw. A watchdog just looped.

This is a problem of the boot process on this particular BSP and not a general 
problem.  In _CPU_SMP_Initialize() the boot processor could test if the other 
processors are up and then reduce the CPU count accordingly.

>
>> Do you
>> really want to work an a system that works under this condition:
>> "Sometimes scheduling actions take place, but you cannot assume that a
>> thread will execute as planned.".
>
> Again a system issue. It is up to system designers on a specific project to
> determine the fail states like this and not operating system kernel designers.
> Allowing a degraded state provides opportunity for recover or fault reporting.
> It is not for everyone or every system.

In this case the operating system must detect the problems and notify the 
application about such error events.  I don't see any problems in adding this 
later, but this is not on my list.

>
>>
>>>
>>>>>
>>>>>>>
>>>>>>> I do not see why we have main and secondary processors ?
>>>>>>
>>>>>> I changed "main processor" into "boot processor" to highlight that
>>>>>> this
>>>>>> is only the case during system boot.
>>>>>>
>>>>>
>>>>> Is there any code checking the processor number ? I see it in the
>>>>>
>>>>>>> This is symmetric
>>>>>>> multiprocessing which means each core is the same therefore
>>>>>>> capable of
>>>>>>> completing any required task. I understand there are paths which
>>>>>>> need to
>>>>>>> complete so if we have states for these phases as gates then any
>>>>>>> processor that
>>>>>>> arrives should be able to enter the gate (spinning lock for those
>>>>>>> that
>>>>>>> need to
>>>>>>> wait) and complete the work. This means a degraded state can exist
>>>>>>> and
>>>>>>> things
>>>>>>> at least start. The application would need to detect and manage the
>>>>>>> degraded
>>>>>>> state and so RTEMS should not be concerned with this condition other
>>>>>>> than doing
>>>>>>> its best to run where possible.
>>>>>>
>>>>>> In case the boot procedure of the system is unstable then it makes no
>>>>>> sense to run an application.
>>>>>>
>>>>>
>>>>> This depends on the application and its system requirements and how the
>>>>> hardware is constructed. A boot monitor can see cores are not
>>>>> present and
>>>>> decide not to start RTEMS or it can decide to start a single core.
>>>>> This relates
>>>>> to the system's requirement and can vary and has little to do with
>>>>> RTEMS. If a
>>>>> boot monitor passes control to RTEMS then it should run and only fail
>>>>> if it's
>>>>> integrity it not correct. RTEMS does should not move into the area of
>>>>> system
>>>>> requirements or constraints. Adding constraints like this in the
>>>>> operating
>>>>> system does not seem a good idea to me.
>>>>
>>>> I see no problem here.  See _CPU_SMP_Initialize() above.
>>>>
>>>
>>> I do not understand. Are you saying the design will run with a core
>>> count that
>>> does not match NP or AP ?
>>
>> It is min { NP, AP }.
>>
>
> IMO an application core count higher than the available cores is a
> configuration error.

I think its quite handy to be able to configure systems that way.  You can 
simply use a bunch of processors for worker pools.

>
>>>
>>>>>
>>>>>>>
>>>>>>> I assume if we have n cores where n is 1..cpus available we enter the
>>>>>>> static
>>>>>>> constructors once and 'main' [1] once and this independent of the
>>>>>>> number of
>>>>>>> defined and/or operating cores. If an application's static
>>>>>>> constructor
>>>>>>> starts
>>>>>>> further threads it needs to manage the concurrency issues and main is
>>>>>>> only
>>>>>>> entered once.
>>>>>>
>>>>>> This change is about the low-level boot procedure.  High level
>>>>>> concepts
>>>>>> like threads are not an issue here.
>>>>>>
>>>>>
>>>>> They are if you remove the all cores needing to sync plus using cpu
>>>>> numbers to
>>>>> control which cores go through to the static constructors and main and
>>>>> which
>>>>> cores go direct to the score thread code.
>>>>
>>>> If you remove this synchronization barrier via
>>>> PER_CPU_STATE_READY_TO_START_MULTITASKING, then you mandate that there
>>>> is boot loader that does these steps.  With the current set-up this boot
>>>> loader is optional.
>>>>
>>>
>>> Or the boot loading is not involved.
>>>
>>>>> It is this sort of code that concerns
>>>>> me ...
>>>>>
>>>>> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/arm/shared/include/arm-a9mpcore-start.h#n88
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Yes, this is a dirty hack.  It works at the moment due to our immature
>>>> SMP state, but this must definitely change.
>>>>
>>>>>
>>>>>
>>>>> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/sparc/shared/start/start.S#n226
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Yes, exactly the same issue.  This processor index 0 == boot processor
>>>> is a bad assumption.
>>>>
>>>
>>> I feel the single major issue here is a lack of defined requirement
>>> for SMP.
>>> The RTEMS Project and its community need to define the top level
>>> requirements
>>> and along with that tests that check those requirements. For me reviewing
>>> patches it is not clear what is a hack or temporary and what is long
>>> term and
>>> here for good. Questioning each patch or change to find this out is time
>>> consuming and not very productive.
>> [...]
>>
>> Yes, we have the wiki page for this:
>>
>> http://www.rtems.org/wiki/index.php?title=SMP
>>
>
> Yes and the section is empty.

What should I do now from your point of view?

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



More information about the devel mailing list