Multiprocessor problems
Joel Sherrill
joel.sherrill at OARcorp.com
Fri Jul 3 15:16:59 UTC 2009
Daniel Hellstrom wrote:
> Hi Joel,
>
> I have attached a patch which introduces a new "early" MP initialization
> routine. This works for SPARC/LEON3 MP.
>
>
Committed to the head. Does this also need to go on the 4.9 branch?
What were you testing with?
--joel
> Daniel
>
>
> Joel Sherrill wrote:
>
>
>> Can you provide a patch Daniel?
>>
>> Daniel Hellstrom wrote:
>>
>>
>>> Hi,
>>>
>>> The problem seems to be the initialization of _Objects_Local_node in
>>> multiprocessor enabled kernels. Since the _MPCI_Initialization()
>>> initializes _Objects_Local_node later than the first semaphores and
>>> tasks are created, this makes the IDs assigned to created objects
>>> incorrect.
>>>
>>> In single processor systems the _Objects_Local_node is a constant set
>>> to 1, but in multiprocessor systems it is initially set to zero and
>>> then initialized by _MPCI_Initialization().
>>>
>>> The problem you experience is probably the same problem I ran into
>>> this week when running on a dual core SPARC/LEON3 system. Two tasks
>>> are created before the node number is setup correctly. See below
>>> print out from GRMON after breaking at Init():
>>>
>>> grmon> thread info
>>>
>>> Name | Type | Id | Prio | Time (h:m:s) | Entry
>>> point | PC | State
>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Int. | internal | 0x09000001 | 255 | 0.000000 |
>>> ?? | 0x0 | READY
>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Int. | classic | 0x09000002 | 0 | 0.000000 | ??
>>> | 0x0 | Wsem
>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> * UI1 | classic | 0x0a010001 | 1 | 0.000000 |
>>> RAM_END | 0x40001368 Init +
>>> 0x4 | READY
>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> As you can see the node number is 0 rather than 1 or 2 in the ID field.
>>>
>>> The bug appears when the first MPCI packet is received on the target
>>> node, the ISR calls _MCPI_Announce which tries to release a
>>> semaphore, the blocked thread are thought to be global and the system
>>> crashes. The function deciding if it is a global or local object
>>> simply checks if they are of the same node, not if the node number is
>>> zero.
>>>
>>> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
>>> uint32_t node
>>> )
>>> {
>>> return ( node == _Objects_Local_node );
>>> }
>>>
>>> To test that this theory holds I changed the declaration of
>>> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared
>>> it in my project initialy initialized to the node number. The LEON3
>>> dual core system now works and I have successfully managed to get
>>> semaphores and tasks interacting between the two nodes.
>>>
>>> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>>>
>>>
>>>
>>> I suggest that the initialization of _Objects_Local_node is moved to
>>> be initialized earlier.
>>>
>>> Regards,
>>> Daniel Hellstrom
>>>
>>>
>>>
>>> Joel Sherrill wrote:
>>>
>>>
>>>
>>>
>>>> Roger Dahlkvist wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using a timer ISR polling method checking for new messages from
>>>>> other nodes. Unfortunately the system crashes as soon as
>>>>> rtems_multiprocessing_announce is called.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> There are no interrupts enabled until the initialization task is
>>>> switched
>>>> in.
>>>>
>>>> I have wondered if it wouldn't make sense to have the MP initialization
>>>> sycnhronization done either explicitly by the application (like
>>>> initialization
>>>> of TCP/IP) or implicitly by the init thread like C++ global
>>>> constructors.
>>>>
>>>> You can try moving this code from exinit.c to threadhandler.c where and
>>>> protect it somehow from being executed more than once.
>>>>
>>>> #if defined(RTEMS_MULTIPROCESSING)
>>>> if ( _System_state_Is_multiprocessing ) {
>>>> _MPCI_Initialization();
>>>> _MPCI_Internal_packets_Send_process_packet(
>>>> MPCI_PACKETS_SYSTEM_VERIFY
>>>> );
>>>> }
>>>> #endif
>>>>
>>>> Then you will at least be able to get your interrupts and call MP
>>>> announce
>>>> to complete system initialization.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> However, rtems_multiprocessing_announce works just fine if it's
>>>>> called just after the initialization phase, before the
>>>>> initinitialization task is started. That's really strange.
>>>>>
>>>>> So for example, if I make one node get initialized and started
>>>>> faster than the other node (using less drivers etc), I'll be able
>>>>> to create global objects. and as long as the other node has not
>>>>> started the initialization task, the message is received and the
>>>>> global objects table is updated, so it can be identified later on.
>>>>> But I can't use them since furter calls to
>>>>> rtems_multiprocessing_announce will fail.
>>>>>
>>>>> At this point I feel like I have tested just about everything, with
>>>>> no luck. It's urgent that I get MP to work properly. I'm using Nios
>>>>> II processors and I have defined my own MPCI routines. I'm
>>>>> confident that they work properly and I have verified that the
>>>>> system crashes before they are even invoked.
>>>>>
>>>>> Is there anyone with MP experience who might have a clue of what's
>>>>> causing my problems? Any help is MUCH appreciated.
>>>>>
>>>>> //Roger
>>>>>
>>>>> _______________________________________________
>>>>> rtems-users mailing list
>>>>> rtems-users at rtems.org
>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherrill at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list