IMPORTANT WAS Re: FW: RTEMS event send/receive - events apparently lost.

Thu Mar 29 15:46:54 UTC 2001

I don't know how this one slipped through.  Zoltan's memory is correct.
We discussed this privately a long time ago.  Somehow the patch never
got applied.  

If you are using a snapshot prior to rtems-19990528 or release 
4.0 or earlier, the same modification will have to be made to the 
_Event_Surrender routine in the file event.c.  As of  rtems-19990528,
the file event.c was split. [If you are using code old enough to
precede the split of event.c, you probably should update for
general purposes anyway. :)]

Given that the network stack does use events, this patch should be
applied if you are using the network stack.  

--joel

Nick.SIMON at syntegra.bt.co.uk wrote:
> 
> Zoltan Kocsi kindly sent me a fix for the problem I had encountered with
> RTEMS events, where one was getting missed if two event sources fired in
> close time.  Attached is a patch file for 4.5.0, and here FYI is Zoltan's
> email:
> 
> -- Nick Simon
> 
> -----Original Message-----
> From: Zoltan Kocsi [mailto:zoltan at bendor.com.au]
> Sent: 22 March 2001 03:29
> To: Nick.SIMON at syntegra.bt.co.uk
> Subject: RTEMS event send/receive - events apparently lost.
> 
> Hi,
> 
> I have sent a bug report and a fix with relation to lost events
> about a year ago (4.0.0). People in the know (including Joel)
> said that indeed it was a bug and that my solution was OK.
> I don't know that the fix was actually applied or not in 4.5.0.
> 
> Nevertheless, here's my original mail, it might be relevant:
> 
> ---------------------------------
> It seems to me that there's a mutual-exclusivity bug in RTEMS events
> and I also think I know where and what and how to fix it.
> 
> The symptom
> ===========
> 
> I have a task which listens to 2 events. One is sent from
> an other task, the other from an interrupt routine.
> 
> It seems that if the interrupt routine calls rtems_event_send() when
> the signalling task is also calling rtems_event_send(), then the
> task's signal gets lost. That is ( DEBUG( c ) is a very fast,
> uninterruptible log function that stores a single character in a
> buffer):
> 
> task1()
> {
>   for ( ;; ) {
>    DEBUG( '?' );
>    rtems_event_receive( 3, RTEMS_WAIT | RTEMS_EVENT_ANY,
>                         RTEMS_NO_TIMEOUT, &event );
>    DEBUG( event+'0' );
>   }
> }
> 
> task2()
> {
>   for (;;) {
>    sleep( 1 );
>    DEBUG( '[' );
>    rtems_event_send( task1_id, 1 );
>    DEBUG( ']' );
>   }
> }
> 
> interrupt()
> {
>    <all sorts of interruptish things>
>    DEBUG( '*' );
>    rtems_event_send( task1_id, 2 );
> }
> 
> Thus, in the debug ?[]2 represents task1() going to wait, task2()
> going signalling and task1 waking up with event 2. Similarly,
> ?*1 means task1() waiting, interrupt signalling, task1 waking up.
> 
> I have a log showing this:
> 
> ?[]2?*1?*1?*1?[]2?*1?[*]1?*1?*1
> 
> As it seems from the log, one event is lost: ?[*]1?*1 shows it.
> 
> Analyzis
> ========
> 
> ?[*] means that the following happened:
> 
> - task1() goes to wait
> - task2() wakes up and goes to send event 2
> - before it finishes sending, an interrupt comes and sends signal 1
> - task2()'s event_send returns, task2() goes to sleep.
> 
> At this moment, the system could be in the following 3 states:
> 
> 1) The interrupt came when task2() has already sent the signal to
>    task1() but task1() have not woken up yet.
>    In this case when task1() wakes up, it should receive all 2
>    signals. It would look like ?[*]3 in the log.
> 
> 2) The interrupt came when task1() was already waken up by task2().
>    In this case task1() received event 2 and have a pending event 1,
>    which it will receive immediatelly next time when it goes to wait.
>    In the log it would be ?[*]2?1
> 
> 3) The interrupt came before task2() had a chance to send its event
>    and task1() is waken up by the interrupt. In this case task1()
>    receives event 1 and has a pending event 2. In the log it would be
>    ?[*]1?2
> 
> However, the log shows ?[*]1?*1. This means that task1() went to
> sleep, both task2() and the interrupt sent an event, task1() woke up,
> received one event and had *no* pending events (for if it had had, the
> next time around it would have woken up immediately, with nothing
> between its ? and 1 or 2 in the log).
> 
> Cause
> =====
> 
> Looking into the event_send() routine offers an explanation.
> The following is happening, IMHO:
> 
> rtems_event_send() does this:
> 
>   _Event_sets_Post( event_in, &api->pending_events );
>   _Event_Surrender( the_thread );
>   _Thread_Enable_dispatch();
> 
> task2 calls event_send( 2 ).
> 
> _Event_sets_Post() sets task1()'s pending_events to 2.
> 
> Now: pending_events 2 and event_condition is 3.
> 
> _Event_Surrender() will then:
> 
> - Disable the interrupt
> - Task is waiting for an event ? YES
> - Task's wait mask and mode satisfied by pending_events ? YES
> - Delete seized event from pending list ==> event_pending is now 0
> - Set the return_argument to the seized event ==> return_argument is now 2
> - Enable interrupt
> 
> If at this moment the interrupt routine arrives and calls
> rtems_event_send( 1 ), it will re-enter _Event_Surrender():
> 
> - Task is waiting for an event ? YES AND THIS IS A BUG !!!
> - Task's wait mask and mode satisfied by pending_events ? YES
> - Delete seized event from pending list ==> event_pending is now 0
> - Set the return_argument to the seized event ==> return_argument is
>   now 1 WHICH IS WRONG !
> 
> That is, since _Thread_Unblock() has not been called yet by task2(),
> task1() is *still* in waiting for event state when the interrupt
> comes, even though the event sent by task2() has already been
> delivered and removed from the pending list. Therefore, the interrupt
> routine's event will simply overwrite task2()'s.
> 
> The fix
> =======
> 
> The solution seems to be relatively simple:
> 
> If an event was seized, then event condition should be cleared, that
> is, in event.c (from line 281):
> 
>   _ISR_Disable( level );
>   pending_events  = api->pending_events;
>   event_condition = (rtems_event_set) the_thread->Wait.count;
> 
>   seized_events = _Event_sets_Get( pending_events, event_condition );
> 
>   if ( !_Event_sets_Is_empty( seized_events ) ) {
>     if ( _States_Is_waiting_for_event( the_thread->current_state ) ) {
>       if ( seized_events == event_condition || _Options_Is_any( option_set )
> ) {
>         api->pending_events =
>            _Event_sets_Clear( pending_events, seized_events );
>         *(rtems_event_set *)the_thread->Wait.return_argument =
> seized_events;
> 
>         _ISR_Flash( level );
> 
> should be changed to:
> 
>   _ISR_Disable( level );
>   pending_events  = api->pending_events;
>   event_condition = (rtems_event_set) the_thread->Wait.count;
> 
>   seized_events = _Event_sets_Get( pending_events, event_condition );
> 
>   if ( !_Event_sets_Is_empty( seized_events ) ) {
>     if ( _States_Is_waiting_for_event( the_thread->current_state ) ) {
>       if ( seized_events == event_condition || _Options_Is_any( option_set )
> ) {
>         api->pending_events =
>            _Event_sets_Clear( pending_events, seized_events );
>         *(rtems_event_set *)the_thread->Wait.return_argument =
> seized_events;
>         (rtems_event_set) the_thread->Wait.count = 0; /* NEW CODE */
> 
>         _ISR_Flash( level );
> 
> This would assure that until the task's state changes to something
> other than waiting for events, no more events will be delivered,
> (subsequent calls will find seized_events 0) all new events will be
> left pending.
> 
> Regards,
> 
> Zoltan
> 
> ******************************************************************************
> 
> Check us out at http://www.syntegra.com
> 
> ***********************************************************************
> 
>   ------------------------------------------------------------------------
>                  Name: eventpatch
>    eventpatch    Type: unspecified type (application/octet-stream)
>              Encoding: quoted-printable

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available             (256) 722-9985