a bug

Victor V. Vengerov Victor.Vengerov at oktetlabs.ru
Wed Mar 22 20:47:59 UTC 2006


Frank, Joel,

Frank: Sorry for my silence. I couldn't say I'm too familiar with MPCI 
code in RTEMS, but at very first glance I'm in doubt your changes are 
correct.

I have looked through MP code again, and found that variable 
_Thread_MP_Receive is not initialized. I have checked it in the debugger 
- yes, it is 0. _Thread_MP_Is_receive() uses _Thread_MP_Receive variable 
to check that thread is MPCI server thread. (Also, this variable used in 
_Thread_MP_Allocate_proxy()).

_Thread_queue_Enqueue() check if current queue is MPCI thread, and, when 
it is true, substitute normal thread control block to MP proxy control. 
This is never happens, because _Thread_MP_Receive is not initialized.

I have added initialization of _Thread_MP_Receive variable to mpci.c - 
find thread-mp-receive.diff patch attached. Also, 4.6.99.2 tree couldn't 
be run using psim simulator out of box - see psim.diff patch (I couldn't 
say I'm happy with existing approach to hook timer interrupts, and I'm 
not sure all my changes are correct - can somebody review my patch?)

BTW, another variable which is equivalent to _Thread_MP_Receive exists - 
it is _MPCI_Receive_server_tcb. Should we keep both of them? (As I 
understand, they are owned by different API, this may be a reason for 
their existance. Is the proposed _Thread_MP_Receive initialization place 
correct?)

Frank, it looks like your test works fine with this change. Could you 
please check how it will work for you? If problem still exists, could 
you explain the meaning of your changes?

Joel: what is the current status of gnats? May I open new PRs in gnats, 
or you have intention to freeze it and kill?

Regards,
Victor

FRANK wrote:

>Victor,
>
>Now I have some process. I have modified some of the codes, and it
>seems that the deadlock has been solved. Here is my modification.
>
>1, In function _Semaphore_MP_Process_packet of semmp.c, I change code
>"if ( ! _Thread_Is_proxy_blocking( the_packet->Prefix.return_code ) )"
>to "if ( ! _Thread_Is_proxy_blocking(
>_Thread_Executing->Wait.return_code ) )"
>
>2, In function _CORE_semaphore_Seize_isr_disable of coresem.inl, I add
>"_Thread_MP_Receive=_Thread_Executing;" before "_Thread_queue_Enqueue(
>&the_semaphore->Wait_queue, timeout );"
>
>How do you think about these.
>
>3, In function _Message_queue_MP_Process_packet of msgmp.c, I change
>code "if ( ! _Thread_Is_proxy_blocking( the_packet->Prefix.return_code
>) )" to "if ( ! _Thread_Is_proxy_blocking(
>_Thread_Executing->Wait.return_code ) )"
>
>4, In function _CORE_message_queue_Seize of coremsgeize.c, I add
>"_Thread_MP_Receive=_Thread_Executing;" before _Thread_queue_Enqueue(
>&the_message_queue->Wait_queue, timeout );"
>
>In addition, I find that the return value to a remote node of function
>rtems_semaphore_flush is not the same as the return value to a local
>node. I think it is not reasonable. so I modfied it. I let the return
>value to both the remote node and local node is the same -- 
>RTEMS_UNSATISFIED.
>
>FRANK
>
>
>2006/3/10, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
>  
>
>>Frank,
>>
>>Just to let you know I'm still here. (I'm little slow - but I have other
>>work to do, sorry).
>>
>>Finally I have forced to run multiprocessing tests on psim simulator
>>with RTEMS-4.99.2. I have integrated your test and it looks like I'm
>>observing behaviour you are described. Now I'm trying to investigate
>>what is happening and why... Let you know when I have some progress.
>>
>>Victor
>>
>>FRANK wrote:
>>
>>    
>>
>>>Hello Victor,
>>>
>>>I am working at the multi-processor's support of 4 processors under
>>>leon2. I found the problem when I tested the semaphore on 4
>>>processors. Then I test this problem again on 2 processors for
>>>confirmation.
>>>Here are my test codes. base_mp_4.rar is for 4 processors, and
>>>base_mp_2 is for 2 processors.
>>>
>>>Frank
>>>
>>>
>>>
>>>2006/3/6, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
>>>
>>>
>>>      
>>>
>>>>Frank,
>>>>
>>>>I'm trying to reproduce this situation. I have built RTEMS and tools
>>>>targeted to powerpc psim simulator (this configuration, in theory, allow
>>>>to run multiprocessor tests). It still have problems - I'm trying to
>>>>bring this configuration up.
>>>>
>>>>Could you send me source code of your tests demonstrating the problem
>>>>you have described?
>>>>
>>>>Victor
>>>>
>>>>FRANK wrote:
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>But there's something wrong when node1 process the MPCI message
>>>>>SEMAPHORE_MP_OBTAIN_REQUEST  from task3. I have test such a programme
>>>>>recently, it really will be deadlock. because when
>>>>>_MPCI_Receive_server on node1 receive the MPCI message
>>>>>SEMAPHORE_MP_OBTAIN_REQUEST  from task3, it process the request
>>>>>itself. if the request has been blocked, the server has been blocked
>>>>>too. considering from the code, that is the thread
>>>>>_MPCI_Receive_server which performs the function.
>>>>>In the function _Thread_queue_Enqueue, there are such codes.
>>>>>
>>>>>the_thread = _Thread_Executing;
>>>>>#if defined(RTEMS_MULTIPROCESSING)
>>>>>if ( _Thread_MP_Is_receive( the_thread ) && the_thread->receive_packet )
>>>>>  the_thread = _Thread_MP_Allocate_proxy( the_thread_queue->state );
>>>>>else
>>>>>#endif
>>>>>  _Thread_Set_state( the_thread, the_thread_queue->state );
>>>>>
>>>>>Here the _MPCI_Receive_server can apply a proxy to do some function
>>>>>that will cause the block. But I find that the condition
>>>>>_Thread_MP_Is_receive( the_thread ) && the_thread->receive_packet
>>>>>nerver been true in my test programme, so it will be deadlock.
>>>>>
>>>>>Thread _MPCI_Receive_server(
>>>>>unsigned32 ignored
>>>>>)
>>>>>{
>>>>>
>>>>>MP_packet_Prefix         *the_packet;
>>>>>MPCI_Packet_processor     the_function;
>>>>>Thread_Control           *executing;
>>>>>
>>>>>executing = _Thread_Executing;
>>>>>
>>>>>for ( ; ; ) {
>>>>>
>>>>>  executing->receive_packet = NULL;
>>>>>
>>>>>  _Thread_Disable_dispatch();
>>>>>  _CORE_semaphore_Seize( &_MPCI_Semaphore, 0, TRUE, WATCHDOG_NO_TIMEOUT );
>>>>>  _Thread_Enable_dispatch();
>>>>>
>>>>>  for ( ; ; ) {
>>>>>    the_packet = _MPCI_Receive_packet();
>>>>>
>>>>>    if ( !the_packet )
>>>>>      break;
>>>>>
>>>>>    executing->receive_packet = the_packet;
>>>>>
>>>>>    if ( !_Mp_packet_Is_valid_packet_class ( the_packet->the_class ) )
>>>>>      break;
>>>>>
>>>>>    the_function = _MPCI_Packet_processors[ the_packet->the_class ];
>>>>>
>>>>>    if ( !the_function )
>>>>>      _Internal_error_Occurred(
>>>>>        INTERNAL_ERROR_CORE,
>>>>>        TRUE,
>>>>>        INTERNAL_ERROR_BAD_PACKET
>>>>>      );
>>>>>
>>>>>      (*the_function)( the_packet );
>>>>>  }
>>>>>}
>>>>>
>>>>>return 0;   /* unreached - only to remove warnings */
>>>>>}
>>>>>
>>>>>
>>>>>Frank
>>>>>
>>>>>
>>>>>2006/3/2, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>Frank,
>>>>>>
>>>>>>OK, I have deal with MPCI in RTEMS long time ago, so it is possible I'm
>>>>>>wrong in details.
>>>>>>
>>>>>>In my understanding, the following sequence of events is happened:
>>>>>>1. task 1 creates semaphore
>>>>>>- MPCI SEMAPHORE_MP_ANNOUNCE_CREATE message sent to node 2 to announce
>>>>>>semaphore creation
>>>>>>2. MPCI task at node 2 process this message
>>>>>>3. task 2 obtains the semaphore
>>>>>>- MPCI message SEMAPHORE_MP_OBTAIN_REQUEST sent to node 1 to get the
>>>>>>semaphore
>>>>>>- task 2 blocked waiting for the answer
>>>>>>4. MPCI task at node 1 receive the obtain message and process it. As
>>>>>>result, it gets the semaphore and send SEMAPHORE_MP_OBTAIN_RESPONSE
>>>>>>message to node 2.
>>>>>>5. MPCI task at node 2 receives the response message and unblock task 2.
>>>>>>Semaphore owned by task 2.
>>>>>>6. task 3 obtains the semaphore
>>>>>>- MPCI message SEMAPHORE_MP_OBTAIN_REQUEST sent to node 1 to get the
>>>>>>semaphore
>>>>>>- task 3 blocked waiting for the answer
>>>>>>7. task 2 release the semaphore
>>>>>>- MPCI message SEMAPHORE_MP_RELEASE_REQUEST sent to the node 1
>>>>>>- task 2 continue it's execution
>>>>>>8. MPCI task at node 1 process SEMAPHORE_MP_RELEASE_REQUEST
>>>>>>- it send SEMAPHORE_MP_RELEASE_RESPONSE to node 2
>>>>>>- because task 3 waiting the semaphore, it send
>>>>>>SEMAPHORE_MP_OBTAIN_RESPONSE to node 2 to resume task 3.
>>>>>>9. MPCI task at node 3 process SEMAPHORE_MP_OBTAIN_RESPONSE message.
>>>>>>- task 3 resumed and own the semaphore.
>>>>>>
>>>>>>No deadlock should happen.
>>>>>>
>>>>>>Victor
>>>>>>
>>>>>>FRANK wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Hi,
>>>>>>>I think there maybe something wrong in the function
>>>>>>>_MPCI_receive_server (in mpci.c) I have tested such a programme. I
>>>>>>>make two nodes. Node1 creates one task to create a semaphore, and
>>>>>>>Node2 creates two tasks--task2 and task3. Task2 obtain the semaphore
>>>>>>>and then release it. Before task2 release the semaphore task3 try to
>>>>>>>obtain the semaphore. As a result it causes a deadlock. The reason I
>>>>>>>think it's that before _MPCI_receive-server has finished the latest
>>>>>>>request, it never receive a new request. But as this test, the obtain
>>>>>>>request of task2 can be satisfied immediately, and the obtain request
>>>>>>>of task3 can be satisfied only after the ralease request of task2
>>>>>>>being satisfied. But the obtain request of task3 comes earlier than
>>>>>>>the ralease request of task2, so before _MPCI_receive_server satisfies
>>>>>>>the obtain request of task3, it will never response the released
>>>>>>>request of task2, and this causes a deadlock.
>>>>>>>Am I right? I hope you would give me a prompt reply.  Thanks a lot.
>>>>>>>
>>>>>>>Frank
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>--
>>>>>>Victor Vengerov
>>>>>>OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
>>>>>>Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>--
>>>>Victor Vengerov
>>>>OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
>>>>Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>--
>>Victor Vengerov
>>OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
>>Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
>>
>>
>>    
>>


-- 
Victor Vengerov
OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: thread-mp-receive.diff
Type: text/x-patch
Size: 525 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20060322/19666fae/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: psim.diff
Type: text/x-patch
Size: 1610 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20060322/19666fae/attachment-0001.bin>


More information about the users mailing list