a bug

FRANK frank1997 at gmail.com
Fri Mar 10 01:02:37 UTC 2006


Victor,

Now I have some process. I have modified some of the codes, and it
seems that the deadlock has been solved. Here is my modification.

1, In function _Semaphore_MP_Process_packet of semmp.c, I change code
"if ( ! _Thread_Is_proxy_blocking( the_packet->Prefix.return_code ) )"
to "if ( ! _Thread_Is_proxy_blocking(
_Thread_Executing->Wait.return_code ) )"

2, In function _CORE_semaphore_Seize_isr_disable of coresem.inl, I add
"_Thread_MP_Receive=_Thread_Executing;" before "_Thread_queue_Enqueue(
&the_semaphore->Wait_queue, timeout );"

How do you think about these.

3, In function _Message_queue_MP_Process_packet of msgmp.c, I change
code "if ( ! _Thread_Is_proxy_blocking( the_packet->Prefix.return_code
) )" to "if ( ! _Thread_Is_proxy_blocking(
_Thread_Executing->Wait.return_code ) )"

4, In function _CORE_message_queue_Seize of coremsgeize.c, I add
"_Thread_MP_Receive=_Thread_Executing;" before _Thread_queue_Enqueue(
&the_message_queue->Wait_queue, timeout );"

In addition, I find that the return value to a remote node of function
rtems_semaphore_flush is not the same as the return value to a local
node. I think it is not reasonable. so I modfied it. I let the return
value to both the remote node and local node is the same -- 
RTEMS_UNSATISFIED.

FRANK


2006/3/10, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
> Frank,
>
> Just to let you know I'm still here. (I'm little slow - but I have other
> work to do, sorry).
>
> Finally I have forced to run multiprocessing tests on psim simulator
> with RTEMS-4.99.2. I have integrated your test and it looks like I'm
> observing behaviour you are described. Now I'm trying to investigate
> what is happening and why... Let you know when I have some progress.
>
> Victor
>
> FRANK wrote:
>
> >Hello Victor,
> >
> >I am working at the multi-processor's support of 4 processors under
> >leon2. I found the problem when I tested the semaphore on 4
> >processors. Then I test this problem again on 2 processors for
> >confirmation.
> >Here are my test codes. base_mp_4.rar is for 4 processors, and
> >base_mp_2 is for 2 processors.
> >
> >Frank
> >
> >
> >
> >2006/3/6, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
> >
> >
> >>Frank,
> >>
> >>I'm trying to reproduce this situation. I have built RTEMS and tools
> >>targeted to powerpc psim simulator (this configuration, in theory, allow
> >>to run multiprocessor tests). It still have problems - I'm trying to
> >>bring this configuration up.
> >>
> >>Could you send me source code of your tests demonstrating the problem
> >>you have described?
> >>
> >>Victor
> >>
> >>FRANK wrote:
> >>
> >>
> >>
> >>>But there's something wrong when node1 process the MPCI message
> >>>SEMAPHORE_MP_OBTAIN_REQUEST  from task3. I have test such a programme
> >>>recently, it really will be deadlock. because when
> >>>_MPCI_Receive_server on node1 receive the MPCI message
> >>>SEMAPHORE_MP_OBTAIN_REQUEST  from task3, it process the request
> >>>itself. if the request has been blocked, the server has been blocked
> >>>too. considering from the code, that is the thread
> >>>_MPCI_Receive_server which performs the function.
> >>>In the function _Thread_queue_Enqueue, there are such codes.
> >>>
> >>>the_thread = _Thread_Executing;
> >>>#if defined(RTEMS_MULTIPROCESSING)
> >>> if ( _Thread_MP_Is_receive( the_thread ) && the_thread->receive_packet )
> >>>   the_thread = _Thread_MP_Allocate_proxy( the_thread_queue->state );
> >>> else
> >>>#endif
> >>>   _Thread_Set_state( the_thread, the_thread_queue->state );
> >>>
> >>>Here the _MPCI_Receive_server can apply a proxy to do some function
> >>>that will cause the block. But I find that the condition
> >>>_Thread_MP_Is_receive( the_thread ) && the_thread->receive_packet
> >>>nerver been true in my test programme, so it will be deadlock.
> >>>
> >>>Thread _MPCI_Receive_server(
> >>> unsigned32 ignored
> >>>)
> >>>{
> >>>
> >>> MP_packet_Prefix         *the_packet;
> >>> MPCI_Packet_processor     the_function;
> >>> Thread_Control           *executing;
> >>>
> >>> executing = _Thread_Executing;
> >>>
> >>> for ( ; ; ) {
> >>>
> >>>   executing->receive_packet = NULL;
> >>>
> >>>   _Thread_Disable_dispatch();
> >>>   _CORE_semaphore_Seize( &_MPCI_Semaphore, 0, TRUE, WATCHDOG_NO_TIMEOUT );
> >>>   _Thread_Enable_dispatch();
> >>>
> >>>   for ( ; ; ) {
> >>>     the_packet = _MPCI_Receive_packet();
> >>>
> >>>     if ( !the_packet )
> >>>       break;
> >>>
> >>>     executing->receive_packet = the_packet;
> >>>
> >>>     if ( !_Mp_packet_Is_valid_packet_class ( the_packet->the_class ) )
> >>>       break;
> >>>
> >>>     the_function = _MPCI_Packet_processors[ the_packet->the_class ];
> >>>
> >>>     if ( !the_function )
> >>>       _Internal_error_Occurred(
> >>>         INTERNAL_ERROR_CORE,
> >>>         TRUE,
> >>>         INTERNAL_ERROR_BAD_PACKET
> >>>       );
> >>>
> >>>       (*the_function)( the_packet );
> >>>   }
> >>> }
> >>>
> >>> return 0;   /* unreached - only to remove warnings */
> >>>}
> >>>
> >>>
> >>>Frank
> >>>
> >>>
> >>>2006/3/2, Victor V. Vengerov <Victor.Vengerov at oktetlabs.ru>:
> >>>
> >>>
> >>>
> >>>
> >>>>Frank,
> >>>>
> >>>>OK, I have deal with MPCI in RTEMS long time ago, so it is possible I'm
> >>>>wrong in details.
> >>>>
> >>>>In my understanding, the following sequence of events is happened:
> >>>>1. task 1 creates semaphore
> >>>>- MPCI SEMAPHORE_MP_ANNOUNCE_CREATE message sent to node 2 to announce
> >>>>semaphore creation
> >>>>2. MPCI task at node 2 process this message
> >>>>3. task 2 obtains the semaphore
> >>>>- MPCI message SEMAPHORE_MP_OBTAIN_REQUEST sent to node 1 to get the
> >>>>semaphore
> >>>>- task 2 blocked waiting for the answer
> >>>>4. MPCI task at node 1 receive the obtain message and process it. As
> >>>>result, it gets the semaphore and send SEMAPHORE_MP_OBTAIN_RESPONSE
> >>>>message to node 2.
> >>>>5. MPCI task at node 2 receives the response message and unblock task 2.
> >>>>Semaphore owned by task 2.
> >>>>6. task 3 obtains the semaphore
> >>>>- MPCI message SEMAPHORE_MP_OBTAIN_REQUEST sent to node 1 to get the
> >>>>semaphore
> >>>>- task 3 blocked waiting for the answer
> >>>>7. task 2 release the semaphore
> >>>>- MPCI message SEMAPHORE_MP_RELEASE_REQUEST sent to the node 1
> >>>>- task 2 continue it's execution
> >>>>8. MPCI task at node 1 process SEMAPHORE_MP_RELEASE_REQUEST
> >>>>- it send SEMAPHORE_MP_RELEASE_RESPONSE to node 2
> >>>>- because task 3 waiting the semaphore, it send
> >>>>SEMAPHORE_MP_OBTAIN_RESPONSE to node 2 to resume task 3.
> >>>>9. MPCI task at node 3 process SEMAPHORE_MP_OBTAIN_RESPONSE message.
> >>>>- task 3 resumed and own the semaphore.
> >>>>
> >>>>No deadlock should happen.
> >>>>
> >>>>Victor
> >>>>
> >>>>FRANK wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Hi,
> >>>>>I think there maybe something wrong in the function
> >>>>>_MPCI_receive_server (in mpci.c) I have tested such a programme. I
> >>>>>make two nodes. Node1 creates one task to create a semaphore, and
> >>>>>Node2 creates two tasks--task2 and task3. Task2 obtain the semaphore
> >>>>>and then release it. Before task2 release the semaphore task3 try to
> >>>>>obtain the semaphore. As a result it causes a deadlock. The reason I
> >>>>>think it's that before _MPCI_receive-server has finished the latest
> >>>>>request, it never receive a new request. But as this test, the obtain
> >>>>>request of task2 can be satisfied immediately, and the obtain request
> >>>>>of task3 can be satisfied only after the ralease request of task2
> >>>>>being satisfied. But the obtain request of task3 comes earlier than
> >>>>>the ralease request of task2, so before _MPCI_receive_server satisfies
> >>>>>the obtain request of task3, it will never response the released
> >>>>>request of task2, and this causes a deadlock.
> >>>>>Am I right? I hope you would give me a prompt reply.  Thanks a lot.
> >>>>>
> >>>>>Frank
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>--
> >>>>Victor Vengerov
> >>>>OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
> >>>>Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>--
> >>Victor Vengerov
> >>OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
> >>Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
> >>
> >>
> >>
> >>
> >
>
>
> --
> Victor Vengerov
> OKTET Labs, St.-Petersburg, Russia   Web: www.oktetlabs.ru
> Phone +7 812 4286709(office) +7 812 9389372(mobile) +7 812 4281653(home)
>
>



More information about the users mailing list