Multiprocessor problems
Joel Sherrill
joel.sherrill at OARcorp.com
Thu Jun 18 15:25:22 UTC 2009
Can you provide a patch Daniel?
Daniel Hellstrom wrote:
> Hi,
>
> The problem seems to be the initialization of _Objects_Local_node in
> multiprocessor enabled kernels. Since the _MPCI_Initialization()
> initializes _Objects_Local_node later than the first semaphores and
> tasks are created, this makes the IDs assigned to created objects incorrect.
>
> In single processor systems the _Objects_Local_node is a constant set to
> 1, but in multiprocessor systems it is initially set to zero and then
> initialized by _MPCI_Initialization().
>
> The problem you experience is probably the same problem I ran into this
> week when running on a dual core SPARC/LEON3 system. Two tasks are
> created before the node number is setup correctly. See below print out
> from GRMON after breaking at Init():
>
> grmon> thread info
>
> Name | Type | Id | Prio | Time (h:m:s) | Entry
> point | PC | State
> ---------------------------------------------------------------------------------------------------------------------------------------
> Int. | internal | 0x09000001 | 255 | 0.000000 | ??
> | 0x0 | READY
> ---------------------------------------------------------------------------------------------------------------------------------------
> Int. | classic | 0x09000002 | 0 | 0.000000 | ??
> | 0x0 | Wsem
> ---------------------------------------------------------------------------------------------------------------------------------------
> * UI1 | classic | 0x0a010001 | 1 | 0.000000 |
> RAM_END | 0x40001368 Init + 0x4 |
> READY
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> As you can see the node number is 0 rather than 1 or 2 in the ID field.
>
> The bug appears when the first MPCI packet is received on the target
> node, the ISR calls _MCPI_Announce which tries to release a semaphore,
> the blocked thread are thought to be global and the system crashes. The
> function deciding if it is a global or local object simply checks if
> they are of the same node, not if the node number is zero.
>
> RTEMS_INLINE_ROUTINE bool _Objects_Is_local_node(
> uint32_t node
> )
> {
> return ( node == _Objects_Local_node );
> }
>
> To test that this theory holds I changed the declaration of
> _Objects_Local_node to extern instead of SCORE_EXTERN, and declared it
> in my project initialy initialized to the node number. The LEON3 dual
> core system now works and I have successfully managed to get semaphores
> and tasks interacting between the two nodes.
>
> uint16_t _Objects_Local_node = CONFIGURE_MP_NODE_NUMBER;
>
>
>
> I suggest that the initialization of _Objects_Local_node is moved to be
> initialized earlier.
>
> Regards,
> Daniel Hellstrom
>
>
>
> Joel Sherrill wrote:
>
>
>> Roger Dahlkvist wrote:
>>
>>
>>
>>> Hi,
>>>
>>> I'm using a timer ISR polling method checking for new messages from other nodes. Unfortunately the system crashes as soon as rtems_multiprocessing_announce is called.
>>>
>>>
>>>
>>>
>> There are no interrupts enabled until the initialization task is switched
>> in.
>>
>> I have wondered if it wouldn't make sense to have the MP initialization
>> sycnhronization done either explicitly by the application (like
>> initialization
>> of TCP/IP) or implicitly by the init thread like C++ global constructors.
>>
>> You can try moving this code from exinit.c to threadhandler.c where and
>> protect it somehow from being executed more than once.
>>
>> #if defined(RTEMS_MULTIPROCESSING)
>> if ( _System_state_Is_multiprocessing ) {
>> _MPCI_Initialization();
>> _MPCI_Internal_packets_Send_process_packet(
>> MPCI_PACKETS_SYSTEM_VERIFY
>> );
>> }
>> #endif
>>
>> Then you will at least be able to get your interrupts and call MP announce
>> to complete system initialization.
>>
>>
>>
>>> However, rtems_multiprocessing_announce works just fine if it's called just after the initialization phase, before the initinitialization task is started. That's really strange.
>>>
>>> So for example, if I make one node get initialized and started faster than the other node (using less drivers etc), I'll be able to create global objects. and as long as the other node has not started the initialization task, the message is received and the global objects table is updated, so it can be identified later on. But I can't use them since furter calls to rtems_multiprocessing_announce will fail.
>>>
>>> At this point I feel like I have tested just about everything, with no luck. It's urgent that I get MP to work properly.
>>> I'm using Nios II processors and I have defined my own MPCI routines. I'm confident that they work properly and I have verified that the system crashes before they are even invoked.
>>>
>>> Is there anyone with MP experience who might have a clue of what's causing my problems? Any help is MUCH appreciated.
>>>
>>> //Roger
>>>
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.org
>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>
>>>
>>>
>>>
>>
>>
>>
>
>
--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherrill at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list