Multiprocessor problems

Wed Jul 1 13:08:39 UTC 2009

Hi Joel,

I have attached a patch which introduces a new "early" MP initialization 
routine. This works for SPARC/LEON3 MP.

Daniel

Joel Sherrill wrote:

> Can you provide a patch Daniel?
>
> Daniel Hellstrom wrote:
>
>> Hi,
>>
>> The problem seems to be the initialization of _Objects_Local_node in 
>> multiprocessor enabled kernels. Since the _MPCI_Initialization() 
>> initializes _Objects_Local_node later than the first semaphores and 
>> tasks are created, this makes the IDs assigned to created objects 
>> incorrect.
>>
>> In single processor systems the _Objects_Local_node is a constant set 
>> to 1, but in multiprocessor systems it is initially set to zero and 
>> then initialized by _MPCI_Initialization().
>>
>> The problem you experience is probably the same problem I ran into 
>> this week when running on a dual core SPARC/LEON3 system. Two tasks 
>> are created before the node number is setup correctly. See below 
>> print out from GRMON after breaking at Init():
>>
>> grmon> thread info
>>
>>   Name | Type     | Id         | Prio | Time (h:m:s)  | Entry 
>> point             | PC                                           | State
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | internal | 0x09000001 |  255 |      0.000000 | 
>> ??               | 0x0           | READY
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | classic  | 0x09000002 |    0 |      0.000000 | ?? 
>>                 | 0x0              | Wsem
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> * UI1  | classic  | 0x0a010001 |    1 |      0.000000 | 
>> RAM_END                 | 0x40001368 Init + 
>> 0x4                        | READY
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>
>> As you can see the node number is 0 rather than 1 or 2 in the ID field.
>>
>> The bug appears when the first MPCI packet is received on the target 
>> node, the ISR calls _MCPI_Announce which tries to release a 
>> semaphore, the blocked thread are thought to be global and the system 
>> crashes. The function deciding if it is a global or local object 
>> simply checks if they are of the same node, not if the node number is 
>> zero.
>>
>> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
>>   uint32_t   node
>> )
>> {
>>   return ( node == _Objects_Local_node );
>> }
>>
>> To test that this theory holds I changed the declaration of 
>> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared 
>> it in my project initialy initialized to the node number. The LEON3 
>> dual core system now works and I have successfully managed to get 
>> semaphores and tasks interacting between the two nodes.
>>
>> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>>
>>
>>
>> I suggest that the initialization of _Objects_Local_node is moved to 
>> be initialized earlier.
>>
>> Regards,
>> Daniel Hellstrom
>>
>>
>>
>> Joel Sherrill wrote:
>>
>>  
>>
>>> Roger Dahlkvist wrote:
>>>  
>>>
>>>    
>>>
>>>> Hi,
>>>>
>>>> I'm using a timer ISR polling method checking for new messages from 
>>>> other nodes. Unfortunately the system crashes as soon as 
>>>> rtems_multiprocessing_announce is called.
>>>>  
>>>>   
>>>>       
>>>
>>> There are no interrupts enabled until the initialization task is 
>>> switched
>>> in.
>>>
>>> I have wondered if it wouldn't make sense to have the MP initialization
>>> sycnhronization done either explicitly by the application (like 
>>> initialization
>>> of TCP/IP) or implicitly by the init thread like C++ global 
>>> constructors.
>>>
>>> You can try moving this code from exinit.c to threadhandler.c where and
>>> protect it somehow from being executed more than once.
>>>
>>>  #if defined(RTEMS_MULTIPROCESSING)
>>>    if ( _System_state_Is_multiprocessing ) {
>>>      _MPCI_Initialization();
>>>      _MPCI_Internal_packets_Send_process_packet(
>>>        MPCI_PACKETS_SYSTEM_VERIFY
>>>      );
>>>    }
>>>  #endif
>>>
>>> Then you will at least be able to get your interrupts and call MP 
>>> announce
>>> to complete system initialization.
>>>  
>>>
>>>    
>>>
>>>> However, rtems_multiprocessing_announce works just fine if it's 
>>>> called just after the initialization phase, before the 
>>>> initinitialization task is started. That's really strange.
>>>>
>>>> So for example, if I make one node get initialized and started 
>>>> faster than the other node (using less drivers etc), I'll be able 
>>>> to create global objects. and as long as the other node has not 
>>>> started the initialization task, the message is received and the 
>>>> global objects table is updated, so it can be identified later on. 
>>>> But I can't use them since furter calls to 
>>>> rtems_multiprocessing_announce will fail.
>>>>
>>>> At this point I feel like I have tested just about everything, with 
>>>> no luck. It's urgent that I get MP to work properly. I'm using Nios 
>>>> II processors and I have defined my own MPCI routines. I'm 
>>>> confident that they work properly and I have verified that the 
>>>> system crashes before they are even invoked.
>>>>
>>>> Is there anyone with MP experience who might have a clue of what's 
>>>> causing my problems? Any help is MUCH appreciated.
>>>>
>>>> //Roger
>>>>
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.org
>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>  
>>>>   
>>>>       
>>>
>>>  
>>>
>>>     
>>
>>
>>   
>
>
>

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Object_Local_node.patch
URL: <http://lists.rtems.org/pipermail/users/attachments/20090701/9fc48587/attachment.ksh>