Multiprocessor problems
Daniel Hellstrom
daniel at gaisler.com
Sat Jul 4 14:58:38 UTC 2009
Joel Sherrill wrote:
> Daniel Hellstrom wrote:
>
>> Hi Joel,
>>
>> I have attached a patch which introduces a new "early" MP
>> initialization routine. This works for SPARC/LEON3 MP.
>>
>>
>
> Committed to the head. Does this also need to go on the 4.9 branch?
Have not investigated this, am on my summer holidays now until 1:st aug.
>
> What were you testing with?
SPARC/LEON3 Dual Core, 256MB SDRAM, 5 Timers, 2 UARTs, PCI and Ethernet.
Booting from RAM using GRMON, and from FLASH using mkprom.
Daniel
>
> --joel
>
>> Daniel
>>
>>
>> Joel Sherrill wrote:
>>
>>
>>
>>> Can you provide a patch Daniel?
>>>
>>> Daniel Hellstrom wrote:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> The problem seems to be the initialization of _Objects_Local_node
>>>> in multiprocessor enabled kernels. Since the _MPCI_Initialization()
>>>> initializes _Objects_Local_node later than the first semaphores and
>>>> tasks are created, this makes the IDs assigned to created objects
>>>> incorrect.
>>>>
>>>> In single processor systems the _Objects_Local_node is a constant
>>>> set to 1, but in multiprocessor systems it is initially set to zero
>>>> and then initialized by _MPCI_Initialization().
>>>>
>>>> The problem you experience is probably the same problem I ran into
>>>> this week when running on a dual core SPARC/LEON3 system. Two tasks
>>>> are created before the node number is setup correctly. See below
>>>> print out from GRMON after breaking at Init():
>>>>
>>>> grmon> thread info
>>>>
>>>> Name | Type | Id | Prio | Time (h:m:s) | Entry
>>>> point | PC |
>>>> State
>>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> Int. | internal | 0x09000001 | 255 | 0.000000 |
>>>> ?? | 0x0 | READY
>>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> Int. | classic | 0x09000002 | 0 | 0.000000 | ??
>>>> | 0x0 | Wsem
>>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> * UI1 | classic | 0x0a010001 | 1 | 0.000000 |
>>>> RAM_END | 0x40001368 Init +
>>>> 0x4 | READY
>>>> ---------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> As you can see the node number is 0 rather than 1 or 2 in the ID
>>>> field.
>>>>
>>>> The bug appears when the first MPCI packet is received on the
>>>> target node, the ISR calls _MCPI_Announce which tries to release a
>>>> semaphore, the blocked thread are thought to be global and the
>>>> system crashes. The function deciding if it is a global or local
>>>> object simply checks if they are of the same node, not if the node
>>>> number is zero.
>>>>
>>>> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
>>>> uint32_t node
>>>> )
>>>> {
>>>> return ( node == _Objects_Local_node );
>>>> }
>>>>
>>>> To test that this theory holds I changed the declaration of
>>>> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared
>>>> it in my project initialy initialized to the node number. The LEON3
>>>> dual core system now works and I have successfully managed to get
>>>> semaphores and tasks interacting between the two nodes.
>>>>
>>>> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>>>>
>>>>
>>>>
>>>> I suggest that the initialization of _Objects_Local_node is moved
>>>> to be initialized earlier.
>>>>
>>>> Regards,
>>>> Daniel Hellstrom
>>>>
>>>>
>>>>
>>>> Joel Sherrill wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Roger Dahlkvist wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using a timer ISR polling method checking for new messages
>>>>>> from other nodes. Unfortunately the system crashes as soon as
>>>>>> rtems_multiprocessing_announce is called.
>>>>>>
>>>>>>
>>>>>
>>>>> There are no interrupts enabled until the initialization task is
>>>>> switched
>>>>> in.
>>>>>
>>>>> I have wondered if it wouldn't make sense to have the MP
>>>>> initialization
>>>>> sycnhronization done either explicitly by the application (like
>>>>> initialization
>>>>> of TCP/IP) or implicitly by the init thread like C++ global
>>>>> constructors.
>>>>>
>>>>> You can try moving this code from exinit.c to threadhandler.c
>>>>> where and
>>>>> protect it somehow from being executed more than once.
>>>>>
>>>>> #if defined(RTEMS_MULTIPROCESSING)
>>>>> if ( _System_state_Is_multiprocessing ) {
>>>>> _MPCI_Initialization();
>>>>> _MPCI_Internal_packets_Send_process_packet(
>>>>> MPCI_PACKETS_SYSTEM_VERIFY
>>>>> );
>>>>> }
>>>>> #endif
>>>>>
>>>>> Then you will at least be able to get your interrupts and call MP
>>>>> announce
>>>>> to complete system initialization.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> However, rtems_multiprocessing_announce works just fine if it's
>>>>>> called just after the initialization phase, before the
>>>>>> initinitialization task is started. That's really strange.
>>>>>>
>>>>>> So for example, if I make one node get initialized and started
>>>>>> faster than the other node (using less drivers etc), I'll be able
>>>>>> to create global objects. and as long as the other node has not
>>>>>> started the initialization task, the message is received and the
>>>>>> global objects table is updated, so it can be identified later
>>>>>> on. But I can't use them since furter calls to
>>>>>> rtems_multiprocessing_announce will fail.
>>>>>>
>>>>>> At this point I feel like I have tested just about everything,
>>>>>> with no luck. It's urgent that I get MP to work properly. I'm
>>>>>> using Nios II processors and I have defined my own MPCI routines.
>>>>>> I'm confident that they work properly and I have verified that
>>>>>> the system crashes before they are even invoked.
>>>>>>
>>>>>> Is there anyone with MP experience who might have a clue of
>>>>>> what's causing my problems? Any help is MUCH appreciated.
>>>>>>
>>>>>> //Roger
>>>>>>
>>>>>> _______________________________________________
>>>>>> rtems-users mailing list
>>>>>> rtems-users at rtems.org
>>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>
>
More information about the users
mailing list