Multiprocessor problems

Daniel Hellstrom daniel at gaisler.com
Sat Jul 4 14:58:38 UTC 2009


Joel Sherrill wrote:

> Daniel Hellstrom wrote:
>
>> Hi Joel,
>>
>> I have attached a patch which introduces a new "early" MP 
>> initialization routine. This works for SPARC/LEON3 MP.
>>
>>   
>
> Committed to the head.  Does this also need to go on the 4.9 branch?

Have not investigated this, am on my summer holidays now until 1:st aug.

>
> What were you testing with?

SPARC/LEON3 Dual Core, 256MB SDRAM, 5 Timers, 2 UARTs, PCI and Ethernet. 
Booting from RAM using GRMON, and from FLASH using mkprom.

Daniel



>
> --joel
>
>> Daniel
>>
>>
>> Joel Sherrill wrote:
>>
>>  
>>
>>> Can you provide a patch Daniel?
>>>
>>> Daniel Hellstrom wrote:
>>>
>>>    
>>>
>>>> Hi,
>>>>
>>>> The problem seems to be the initialization of _Objects_Local_node 
>>>> in multiprocessor enabled kernels. Since the _MPCI_Initialization() 
>>>> initializes _Objects_Local_node later than the first semaphores and 
>>>> tasks are created, this makes the IDs assigned to created objects 
>>>> incorrect.
>>>>
>>>> In single processor systems the _Objects_Local_node is a constant 
>>>> set to 1, but in multiprocessor systems it is initially set to zero 
>>>> and then initialized by _MPCI_Initialization().
>>>>
>>>> The problem you experience is probably the same problem I ran into 
>>>> this week when running on a dual core SPARC/LEON3 system. Two tasks 
>>>> are created before the node number is setup correctly. See below 
>>>> print out from GRMON after breaking at Init():
>>>>
>>>> grmon> thread info
>>>>
>>>>   Name | Type     | Id         | Prio | Time (h:m:s)  | Entry 
>>>> point             | PC                                           | 
>>>> State
>>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>   Int. | internal | 0x09000001 |  255 |      0.000000 | 
>>>> ??               | 0x0           | READY
>>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>   Int. | classic  | 0x09000002 |    0 |      0.000000 | ?? 
>>>>                 | 0x0              | Wsem
>>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>> * UI1  | classic  | 0x0a010001 |    1 |      0.000000 | 
>>>> RAM_END                 | 0x40001368 Init + 
>>>> 0x4                        | READY
>>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>>
>>>>
>>>> As you can see the node number is 0 rather than 1 or 2 in the ID 
>>>> field.
>>>>
>>>> The bug appears when the first MPCI packet is received on the 
>>>> target node, the ISR calls _MCPI_Announce which tries to release a 
>>>> semaphore, the blocked thread are thought to be global and the 
>>>> system crashes. The function deciding if it is a global or local 
>>>> object simply checks if they are of the same node, not if the node 
>>>> number is zero.
>>>>
>>>> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
>>>>   uint32_t   node
>>>> )
>>>> {
>>>>   return ( node == _Objects_Local_node );
>>>> }
>>>>
>>>> To test that this theory holds I changed the declaration of 
>>>> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared 
>>>> it in my project initialy initialized to the node number. The LEON3 
>>>> dual core system now works and I have successfully managed to get 
>>>> semaphores and tasks interacting between the two nodes.
>>>>
>>>> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>>>>
>>>>
>>>>
>>>> I suggest that the initialization of _Objects_Local_node is moved 
>>>> to be initialized earlier.
>>>>
>>>> Regards,
>>>> Daniel Hellstrom
>>>>
>>>>
>>>>
>>>> Joel Sherrill wrote:
>>>>
>>>>  
>>>>
>>>>      
>>>>
>>>>> Roger Dahlkvist wrote:
>>>>>  
>>>>>
>>>>>   
>>>>>        
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using a timer ISR polling method checking for new messages 
>>>>>> from other nodes. Unfortunately the system crashes as soon as 
>>>>>> rtems_multiprocessing_announce is called.
>>>>>>  
>>>>>>                   
>>>>>
>>>>> There are no interrupts enabled until the initialization task is 
>>>>> switched
>>>>> in.
>>>>>
>>>>> I have wondered if it wouldn't make sense to have the MP 
>>>>> initialization
>>>>> sycnhronization done either explicitly by the application (like 
>>>>> initialization
>>>>> of TCP/IP) or implicitly by the init thread like C++ global 
>>>>> constructors.
>>>>>
>>>>> You can try moving this code from exinit.c to threadhandler.c 
>>>>> where and
>>>>> protect it somehow from being executed more than once.
>>>>>
>>>>>  #if defined(RTEMS_MULTIPROCESSING)
>>>>>    if ( _System_state_Is_multiprocessing ) {
>>>>>      _MPCI_Initialization();
>>>>>      _MPCI_Internal_packets_Send_process_packet(
>>>>>        MPCI_PACKETS_SYSTEM_VERIFY
>>>>>      );
>>>>>    }
>>>>>  #endif
>>>>>
>>>>> Then you will at least be able to get your interrupts and call MP 
>>>>> announce
>>>>> to complete system initialization.
>>>>>  
>>>>>
>>>>>   
>>>>>        
>>>>>
>>>>>> However, rtems_multiprocessing_announce works just fine if it's 
>>>>>> called just after the initialization phase, before the 
>>>>>> initinitialization task is started. That's really strange.
>>>>>>
>>>>>> So for example, if I make one node get initialized and started 
>>>>>> faster than the other node (using less drivers etc), I'll be able 
>>>>>> to create global objects. and as long as the other node has not 
>>>>>> started the initialization task, the message is received and the 
>>>>>> global objects table is updated, so it can be identified later 
>>>>>> on. But I can't use them since furter calls to 
>>>>>> rtems_multiprocessing_announce will fail.
>>>>>>
>>>>>> At this point I feel like I have tested just about everything, 
>>>>>> with no luck. It's urgent that I get MP to work properly. I'm 
>>>>>> using Nios II processors and I have defined my own MPCI routines. 
>>>>>> I'm confident that they work properly and I have verified that 
>>>>>> the system crashes before they are even invoked.
>>>>>>
>>>>>> Is there anyone with MP experience who might have a clue of 
>>>>>> what's causing my problems? Any help is MUCH appreciated.
>>>>>>
>>>>>> //Roger
>>>>>>
>>>>>> _______________________________________________
>>>>>> rtems-users mailing list
>>>>>> rtems-users at rtems.org
>>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>>>  
>>>>>>                   
>>>>>
>>>>>  
>>>>>
>>>>>             
>>>>
>>>>         
>>>
>>>
>>>     
>>
>>
>>   
>
>
>




More information about the users mailing list