Multiprocessor problems

Daniel Hellstrom daniel at gaisler.com
Wed Jul 1 13:14:03 UTC 2009


Hello,

The init task name is defined by confdef.h, it is possible to set the 
init task name to a custom name using the CONFIGURE_INIT_TASK_NAME 
define, for example:

#define CONFIGURE_INIT_TASK_NAME          rtems_build_name('U', 'I', 
'0'+CONFIGURE_MP_NODE_NUMBER, ' ')

Perhaps a patch is not needed in this case?

Daniel



Joel Sherrill wrote:

> Provide a patch and I will merge it. :)
>
> --joel
>
> Daniel Hellstrom wrote:
>
>> Hi,
>>
>> On a similar MP topic, all Init tasks have the same name "UI1" 
>> regardless of CPU node. I have seen in the mptests that the 
>> CONFIGURE_INIT_TASK_ATTRIBUTES is set to RTEMS_GLOBAL, this means 
>> that the rtems_ident_task() can not be used to look up the ID of the 
>> remote node's Init task. Perhaps the Init task name could be 
>> {'U','I','0'+nodeid,'\0'} instead?
>>
>>
>> GRMON thread info output from the two LEON3 CPUs, CPU0 
>> [0x40000000-0x43FFFFFF0] and CPU1 [0x44000000-0x47FFFFFF0]:
>>
>> grlib> sym rtems-mp1
>> read 1456 symbols
>> entry point: 0x40000000
>> grlib> thread info
>>
>>   Name | Type     | Id         | Prio | Time (h:m:s)  | Entry 
>> point             | PC                                           | State
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | internal | 0x09010001 |  255 |      0.000000 | 
>> _BSP_Thread_Idle_body   | 0x400030a4 _BSP_Thread_Idle_body + 
>> 0x0       | READY    
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | classic  | 0x09010002 |    0 |      0.005648 | 
>> _MPCI_Receive_server    | 0x4000c66c _Thread_Dispatch + 
>> 0xd8           | Wsem     
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> * UI1  | classic  | 0x0a010001 |    1 |      0.000000 | 
>> Init                    | 0x40001368 Init + 
>> 0x4                        | READY    
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>
>> grlib> sym rtems-mp2
>> read 1456 symbols
>> entry point: 0x44000000
>> grlib> thread info
>>
>>   Name | Type     | Id         | Prio | Time (h:m:s)  | Entry 
>> point             | PC                                           | State
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | internal | 0x09020001 |  255 |      0.000000 | 
>> _BSP_Thread_Idle_body   | 0x440030a4 _BSP_Thread_Idle_body + 
>> 0x0       | READY    
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>   Int. | classic  | 0x09020002 |    0 |      0.005661 | 
>> _MPCI_Receive_server    | 0x4400c66c _Thread_Dispatch + 
>> 0xd8           | Wsem     
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> * UI1  | classic  | 0x0a020001 |    1 |      0.000000 | 
>> Init                    | 0x40001368 _RAM_SIZE + 
>> 0x3c00136c            | READY    
>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>
>>
>> Daniel
>>
>>
>>
>> Daniel Hellstrom wrote:
>>
>>  
>>
>>> Hi,
>>>
>>> The problem seems to be the initialization of _Objects_Local_node in 
>>> multiprocessor enabled kernels. Since the _MPCI_Initialization() 
>>> initializes _Objects_Local_node later than the first semaphores and 
>>> tasks are created, this makes the IDs assigned to created objects 
>>> incorrect.
>>>
>>> In single processor systems the _Objects_Local_node is a constant 
>>> set to 1, but in multiprocessor systems it is initially set to zero 
>>> and then initialized by _MPCI_Initialization().
>>>
>>> The problem you experience is probably the same problem I ran into 
>>> this week when running on a dual core SPARC/LEON3 system. Two tasks 
>>> are created before the node number is setup correctly. See below 
>>> print out     
>>
>> >from GRMON after breaking at Init():
>>  
>>
>>> grmon> thread info
>>>
>>>  Name | Type     | Id         | Prio | Time (h:m:s)  | Entry 
>>> point             | PC                                           | 
>>> State
>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>  Int. | internal | 0x09000001 |  255 |      0.000000 | 
>>> ??               | 0x0           | READY
>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>  Int. | classic  | 0x09000002 |    0 |      0.000000 | ?? 
>>>                | 0x0              | Wsem
>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>> * UI1  | classic  | 0x0a010001 |    1 |      0.000000 | 
>>> RAM_END                 | 0x40001368 Init + 
>>> 0x4                        | READY
>>> --------------------------------------------------------------------------------------------------------------------------------------- 
>>>
>>>
>>> As you can see the node number is 0 rather than 1 or 2 in the ID field.
>>>
>>> The bug appears when the first MPCI packet is received on the target 
>>> node, the ISR calls _MCPI_Announce which tries to release a 
>>> semaphore, the blocked thread are thought to be global and the 
>>> system crashes. The function deciding if it is a global or local 
>>> object simply checks if they are of the same node, not if the node 
>>> number is zero.
>>>
>>> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
>>>  uint32_t   node
>>> )
>>> {
>>>  return ( node == _Objects_Local_node );
>>> }
>>>
>>> To test that this theory holds I changed the declaration of 
>>> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared 
>>> it in my project initialy initialized to the node number. The LEON3 
>>> dual core system now works and I have successfully managed to get 
>>> semaphores and tasks interacting between the two nodes.
>>>
>>> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>>>
>>>
>>>
>>> I suggest that the initialization of _Objects_Local_node is moved to 
>>> be initialized earlier.
>>>
>>> Regards,
>>> Daniel Hellstrom
>>>
>>>
>>>
>>> Joel Sherrill wrote:
>>>
>>>  
>>>
>>>    
>>>
>>>> Roger Dahlkvist wrote:
>>>>
>>>>
>>>>   
>>>>      
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm using a timer ISR polling method checking for new messages 
>>>>> from other nodes. Unfortunately the system crashes as soon as 
>>>>> rtems_multiprocessing_announce is called.
>>>>>
>>>>>  
>>>>>     
>>>>>         
>>>>
>>>> There are no interrupts enabled until the initialization task is 
>>>> switched
>>>> in.
>>>>
>>>> I have wondered if it wouldn't make sense to have the MP 
>>>> initialization
>>>> sycnhronization done either explicitly by the application (like 
>>>> initialization
>>>> of TCP/IP) or implicitly by the init thread like C++ global 
>>>> constructors.
>>>>
>>>> You can try moving this code from exinit.c to threadhandler.c where 
>>>> and
>>>> protect it somehow from being executed more than once.
>>>>
>>>> #if defined(RTEMS_MULTIPROCESSING)
>>>>   if ( _System_state_Is_multiprocessing ) {
>>>>     _MPCI_Initialization();
>>>>     _MPCI_Internal_packets_Send_process_packet(
>>>>       MPCI_PACKETS_SYSTEM_VERIFY
>>>>     );
>>>>   }
>>>> #endif
>>>>
>>>> Then you will at least be able to get your interrupts and call MP 
>>>> announce
>>>> to complete system initialization.
>>>>
>>>>
>>>>   
>>>>      
>>>>
>>>>> However, rtems_multiprocessing_announce works just fine if it's 
>>>>> called just after the initialization phase, before the 
>>>>> initinitialization task is started. That's really strange.
>>>>>
>>>>> So for example, if I make one node get initialized and started 
>>>>> faster than the other node (using less drivers etc), I'll be able 
>>>>> to create global objects. and as long as the other node has not 
>>>>> started the initialization task, the message is received and the 
>>>>> global objects table is updated, so it can be identified later on. 
>>>>> But I can't use them since furter calls to 
>>>>> rtems_multiprocessing_announce will fail.
>>>>>
>>>>> At this point I feel like I have tested just about everything, 
>>>>> with no luck. It's urgent that I get MP to work properly. I'm 
>>>>> using Nios II processors and I have defined my own MPCI routines. 
>>>>> I'm confident that they work properly and I have verified that the 
>>>>> system crashes before they are even invoked.
>>>>>
>>>>> Is there anyone with MP experience who might have a clue of what's 
>>>>> causing my problems? Any help is MUCH appreciated.
>>>>>
>>>>> //Roger
>>>>>
>>>>> _______________________________________________
>>>>> rtems-users mailing list
>>>>> rtems-users at rtems.org
>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>>
>>>>>  
>>>>>     
>>>>>         
>>>>
>>>>   
>>>>       
>>>
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.org
>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>
>>>
>>>  
>>>
>>>     
>>
>>
>>   
>
>
>




More information about the users mailing list