Possible problem in rtems_event_send/receive

Nick.SIMON at syntegra.com Nick.SIMON at syntegra.com
Mon Apr 30 15:23:41 UTC 2001


This is indeed a bug, first discovered (AFAIK) by Zoltan Kosci quite a while
back, and rediscovered by me earlier this year.  The following is Zoltan's
analysis of the problem, and a fix, which worked for me.

Regards,


-- Nick Simon 

-----Original Message-----
From: Zoltan Kocsi [mailto:zoltan at bendor.com.au] 
Sent: 22 March 2001 03:29
To: Nick.SIMON at syntegra.bt.co.uk
Subject: RTEMS event send/receive - events apparently lost.


Hi,

I have sent a bug report and a fix with relation to lost events 
about a year ago (4.0.0). People in the know (including Joel)
said that indeed it was a bug and that my solution was OK.
I don't know that the fix was actually applied or not in 4.5.0.

Nevertheless, here's my original mail, it might be relevant:

---------------------------------
It seems to me that there's a mutual-exclusivity bug in RTEMS events 
and I also think I know where and what and how to fix it.

The symptom
===========

I have a task which listens to 2 events. One is sent from 
an other task, the other from an interrupt routine.

It seems that if the interrupt routine calls rtems_event_send() when
the signalling task is also calling rtems_event_send(), then the
task's signal gets lost. That is ( DEBUG( c ) is a very fast,
uninterruptible log function that stores a single character in a 
buffer):

task1()
{
  for ( ;; ) { 
   DEBUG( '?' );
   rtems_event_receive( 3, RTEMS_WAIT | RTEMS_EVENT_ANY, 
                        RTEMS_NO_TIMEOUT, &event ); 
   DEBUG( event+'0' );
  }
}

task2()
{
  for (;;) { 
   sleep( 1 );
   DEBUG( '[' );
   rtems_event_send( task1_id, 1 );
   DEBUG( ']' );
  }
}

interrupt()
{
   <all sorts of interruptish things>
   DEBUG( '*' );
   rtems_event_send( task1_id, 2 );
}

Thus, in the debug ?[]2 represents task1() going to wait, task2()
going signalling and task1 waking up with event 2. Similarly,
?*1 means task1() waiting, interrupt signalling, task1 waking up.

I have a log showing this:

?[]2?*1?*1?*1?[]2?*1?[*]1?*1?*1

As it seems from the log, one event is lost: ?[*]1?*1 shows it.

Analyzis
========

?[*] means that the following happened:

- task1() goes to wait
- task2() wakes up and goes to send event 2
- before it finishes sending, an interrupt comes and sends signal 1
- task2()'s event_send returns, task2() goes to sleep.

At this moment, the system could be in the following 3 states:

1) The interrupt came when task2() has already sent the signal to
   task1() but task1() have not woken up yet.
   In this case when task1() wakes up, it should receive all 2
   signals. It would look like ?[*]3 in the log.

2) The interrupt came when task1() was already waken up by task2().
   In this case task1() received event 2 and have a pending event 1,
   which it will receive immediatelly next time when it goes to wait.
   In the log it would be ?[*]2?1
   
3) The interrupt came before task2() had a chance to send its event
   and task1() is waken up by the interrupt. In this case task1()
   receives event 1 and has a pending event 2. In the log it would be
   ?[*]1?2

However, the log shows ?[*]1?*1. This means that task1() went to
sleep, both task2() and the interrupt sent an event, task1() woke up,
received one event and had *no* pending events (for if it had had, the
next time around it would have woken up immediately, with nothing
between its ? and 1 or 2 in the log). 

Cause
=====

Looking into the event_send() routine offers an explanation. 
The following is happening, IMHO:

rtems_event_send() does this:

  _Event_sets_Post( event_in, &api->pending_events );
  _Event_Surrender( the_thread );
  _Thread_Enable_dispatch();

task2 calls event_send( 2 ).

_Event_sets_Post() sets task1()'s pending_events to 2.

Now: pending_events 2 and event_condition is 3.

_Event_Surrender() will then:

- Disable the interrupt
- Task is waiting for an event ? YES
- Task's wait mask and mode satisfied by pending_events ? YES
- Delete seized event from pending list ==> event_pending is now 0
- Set the return_argument to the seized event ==> return_argument is now 2
- Enable interrupt

If at this moment the interrupt routine arrives and calls
rtems_event_send( 1 ), it will re-enter _Event_Surrender():

- Task is waiting for an event ? YES AND THIS IS A BUG !!!
- Task's wait mask and mode satisfied by pending_events ? YES
- Delete seized event from pending list ==> event_pending is now 0
- Set the return_argument to the seized event ==> return_argument is 
  now 1 WHICH IS WRONG !

That is, since _Thread_Unblock() has not been called yet by task2(), 
task1() is *still* in waiting for event state when the interrupt
comes, even though the event sent by task2() has already been
delivered and removed from the pending list. Therefore, the interrupt
routine's event will simply overwrite task2()'s.

The fix
=======

The solution seems to be relatively simple:

If an event was seized, then event condition should be cleared, that
is, in event.c (from line 281):

  _ISR_Disable( level );
  pending_events  = api->pending_events;
  event_condition = (rtems_event_set) the_thread->Wait.count;

  seized_events = _Event_sets_Get( pending_events, event_condition );

  if ( !_Event_sets_Is_empty( seized_events ) ) {
    if ( _States_Is_waiting_for_event( the_thread->current_state ) ) {
      if ( seized_events == event_condition || _Options_Is_any( option_set )
) {
        api->pending_events =
           _Event_sets_Clear( pending_events, seized_events );
        *(rtems_event_set *)the_thread->Wait.return_argument =
seized_events;

        _ISR_Flash( level );

should be changed to:

  _ISR_Disable( level );
  pending_events  = api->pending_events;
  event_condition = (rtems_event_set) the_thread->Wait.count;

  seized_events = _Event_sets_Get( pending_events, event_condition );

  if ( !_Event_sets_Is_empty( seized_events ) ) {
    if ( _States_Is_waiting_for_event( the_thread->current_state ) ) {
      if ( seized_events == event_condition || _Options_Is_any( option_set )
) {
        api->pending_events =
           _Event_sets_Clear( pending_events, seized_events );
        *(rtems_event_set *)the_thread->Wait.return_argument =
seized_events;
        (rtems_event_set) the_thread->Wait.count = 0; /* NEW CODE */
		
        _ISR_Flash( level );

This would assure that until the task's state changes to something
other than waiting for events, no more events will be delivered,
(subsequent calls will find seized_events 0) all new events will be 
left pending.

Regards,

Zoltan


***********************************************************************

Check us out at http://www.syntegra.com

***********************************************************************



More information about the users mailing list