Critical Section: rtems_timer_fire_after, rtems_timer_server_fire_after

Joel Sherrill <joel@OARcorp.com> joel.sherrill at OARcorp.com
Thu Jun 16 18:42:04 UTC 2005


Can you provide the patch in diff format?

It would be safer to apply that way.

Thanks.

--joel

Andrew Sinclair wrote:
> Thanks Joel. I didn't expect a quick reply on this one. It took me some
> time to understand what was going on, and I appreciate everyone is busy.
> I hope you are feeling better.
> 
> I agree with all your discussion points. I agree that the higher
> interrupt is the one that should ultimately set the timer, and both
> attempts by the low and high priority tasks to set the timer should be
> classed as a success. 
> 
> 
> -----Original Message-----
> From: Joel Sherrill <joel at OARcorp.com>
> [mailto:joel.sherrill at OARcorp.com] 
> Sent: Saturday, 11 June 2005 4:15 AM
> To: Andrew Sinclair
> Cc: rtems-users at rtems.com
> Subject: Re: Critical Section: rtems_timer_fire_after,
> rtems_timer_server_fire_after
> 
> Andrew Sinclair wrote:
> 
>>Hello there,
>>
>>I believe I have encountered a critical section problem with 
>>rtems_timer_fire_after and rtems_timer_server_fire_after. According to
> 
> 
>>previous discussions on the mailing list it is acceptable to call 
>>these during interrupts. There is a case I have found where calling 
>>these will cause issues. I am using release 4.6.2.
>>
>>If a task or low priority interrupt modify the same timer as a high 
>>priority interrupt, then corruption of the timer chain can occur.
>>
>>For sake of simplicity, I have listed the general algorithm in RTEMS 
>>below.
>>
>>Consider a task/low priority interrupt calling rtems_timer_fire_after.
>>Inside the marked critical section, a high priority interrupt may 
>>occur and insert the same timer into the chain without the task/lower 
>>priority interrupt having knowledge. When the task/lower priority 
>>interrupt resumes, it blindly modifies watchdog->state back to 
>>inactive, and reinserts the timer. This usually results in the timer 
>>chain previous and next pointers forming a loop, and ultimately 
>>locking up the device when another timer must be inserted.
>>
>>Task
>> \-->rtems_timer_fire_after
>>      \-->Watchdog_Remove
>>           \--> disable interrrupts
>>           \--> remove timer if WATCHDOG_ACTIVE or stop if being 
>>inserted
>>           \--> watchdog->state=WATCHDOG_INACTIVE
>>           \--> enable interrupts
>>[critical section start]
>>      \-->Watchdog_Initialize
>>           \--> watchdog->state=WATCHDOG_INACTIVE
>>      \-->Watchdog_Insert_ticks
>>           \-->Watchdog_Insert
>>                \-->watchdog->state=WATCHDOG_BEING_INSERTED
>>[critical section end]
>>                \-->disable interrupts
>>                \-->while in timer chain 
>>                     \-->if at insert position exit
>>                     \-->flash interrupts
>>                     \-->check if timer has been touched
>>                \-->insert timer
>>                \-->enable interrupts
>>
>>My solution to this problem is listed below involves modification to 
>>rems_timer_fire_after, rtems_timer_fire_after, and Watchdog_Insert.
>>
>>Before watchdog->state is modified, I check disable interrupts and 
>>check if it has been modified. If it has been modified, then abort.
>>
>>I have listed relevant sections of code I hacked at below. Testing so 
>>far appears to have solved this issue. I see now that I probably 
>>should return RTEMS_INTERNAL_ERROR on abort.
>>Can anyone see problems with the changes I have made? 
> 
> 
> It took me some time to mentally digest this and I think your changes
> are OK.  Once the discussion is done, you need to file a PR with a
> patch.
> 
> The question is more of a desired behavior.
> 
> + Should the interrupted operation be thrown away?  Or should
>    the watchdog state be "initialzed atomically to being inserted"
>    and the interrupting operation return imemdiately.  Now the
>    interrupting operation trumps the interrupted one.  I think this
>    is probably OK logically because it should be logically the same
>    as if the 1st completed and the 2nd then ran to completion.
> 
> + Should the interrupted operation return successful or something
>    like invalid state?  Successful is OK if we take the "2nd operation
>    trumps" view.
> 
> + I don't like calling it abort just because it sounds like an error.
>    It could be called something like insert_interrupted.
> 
> + How about an early exit rather than a goto?
> 
> Sorry for the delay.  This really was hard to think about -- especially
> with the allergies I had going on last week.
> 
> 
>>Kind Regards
>> 
>>Andrew Sinclair
>>
>>------------------------------------------------------------
>>
>>rtems_status_code rtems_timer_fire_after(
>>  Objects_Id                         id,
>>  rtems_interval                     ticks,
>>  rtems_timer_service_routine_entry  routine,
>>  void                              *user_data
>>)
>>{
>>  Timer_Control      *the_timer;
>>  Objects_Locations   location;
>>  ISR_Level          	level; /*AS: line modified*/
>>
>>  if ( ticks == 0 )
>>    return RTEMS_INVALID_NUMBER;
>>
>>  if ( !routine )
>>    return RTEMS_INVALID_ADDRESS;
>>
>>  the_timer = _Timer_Get( id, &location );
>>  switch ( location ) {
>>    case OBJECTS_REMOTE:            /* should never return this */
>>      return RTEMS_INTERNAL_ERROR;
>>
>>    case OBJECTS_ERROR:
>>      return RTEMS_INVALID_ID;
>>
>>    case OBJECTS_LOCAL:
>>      (void) _Watchdog_Remove( &the_timer->Ticker );
>>/*AS: start, perform inline initialisation with interrupts disabled, 
>>abort if watchdog touched*/
>>      _ISR_Disable( level );
>>	  if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
>>		goto timer_fire_after_abort_insert;
>>      the_timer->the_class = TIMER_INTERVAL;
>>      _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
> 
> 
>>);
>>      _ISR_Enable( level );
>>/*AS: end*/
>>
>>      _Watchdog_Insert_ticks( &the_timer->Ticker, ticks );
>>      _Thread_Enable_dispatch();
>>      return RTEMS_SUCCESSFUL;
>>  }
>>
>>  return RTEMS_INTERNAL_ERROR;   /* unreached - only to remove
> 
> warnings
> 
>>*/
>>/*AS: start*/
>>timer_fire_after_abort_insert:
>>  _ISR_Enable( level );
>>  _Thread_Enable_dispatch();
>>  return RTEMS_SUCCESSFUL;
>>/*AS: end*/
>>}
>>
>>------------------------------------------------------------
>>
>>rtems_status_code rtems_timer_server_fire_after(
>>  Objects_Id                         id,
>>  rtems_interval                     ticks,
>>  rtems_timer_service_routine_entry  routine,
>>  void                              *user_data
>>)
>>{
>>  Timer_Control        *the_timer;
>>  Objects_Locations     location;
>>  extern Chain_Control  _Timer_Ticks_chain;
>>  ISR_Level          	level; /*AS: line modified*/
>>
>>  if ( !_Timer_Server )
>>    return RTEMS_INCORRECT_STATE;
>>
>>  if ( !routine )
>>    return RTEMS_INVALID_ADDRESS;
>>
>>  if ( ticks == 0 )
>>    return RTEMS_INVALID_NUMBER;
>>
>>  the_timer = _Timer_Get( id, &location );
>>  switch ( location ) {
>>    case OBJECTS_REMOTE:            /* should never return this */
>>      return RTEMS_INTERNAL_ERROR;
>>
>>    case OBJECTS_ERROR:
>>      return RTEMS_INVALID_ID;
>>
>>    case OBJECTS_LOCAL:
>>      (void) _Watchdog_Remove( &the_timer->Ticker );
>>
>>/*AS: start, perform inline initialisation with interrupts disabled, 
>>abort if watchdog touched*/
>>      _ISR_Disable( level );
>>	  if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
>>		goto timer_server_fire_after_abort_insert;
>>      the_timer->the_class = TIMER_INTERVAL_ON_TASK;
>>      _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
> 
> 
>>);
>>      the_timer->Ticker.initial = ticks;
>>      _ISR_Enable( level );
>>/*AS: end*/
>>      _Timer_Server_stop_ticks_timer();
>>      _Timer_Server_process_ticks_chain();
>>      _Watchdog_Insert( &_Timer_Ticks_chain, &the_timer->Ticker );
>>       _Timer_Server_reset_ticks_timer();
>>      _Thread_Enable_dispatch();
>>      return RTEMS_SUCCESSFUL;
>>  }
>>
>>  return RTEMS_INTERNAL_ERROR;   /* unreached - only to remove
> 
> warnings
> 
>>*/
>>/*AS: start*/
>>timer_server_fire_after_abort_insert:
>>  _ISR_Enable( level );
>>  _Thread_Enable_dispatch();
>>  return RTEMS_SUCCESSFUL;
>>/*AS: end*/
>>}
>>
>>------------------------------------------------------------
>>
>>void _Watchdog_Insert(
>>  Chain_Control         *header,
>>  Watchdog_Control      *the_watchdog
>>)
>>{
>>  ISR_Level          level;
>>  Watchdog_Control  *after;
>>  unsigned32         insert_isr_nest_level;
>>  Watchdog_Interval  delta_interval;
>>  
>>
>>  insert_isr_nest_level   = _ISR_Nest_level;
>>/*AS: start , check if watchdog has been touched, and abort if 
>>necessary*/
>>  _ISR_Disable( level ); 
>>  if (the_watchdog->state != WATCHDOG_INACTIVE)
>>	goto abort_insert;
>>/*AS: end*/
>>
>>  the_watchdog->state = WATCHDOG_BEING_INSERTED;
>>
>>
>>  _Watchdog_Sync_count++;
>>
>>restart:
>>  delta_interval = the_watchdog->initial;
>>
>>  /*
>>   * We CANT use _Watchdog_First() here, because a TICK interrupt
>>   * could modify the chain during the _ISR_Flash() below. Hence,
>>   * the header is pointing to volatile data. The _Watchdog_First()
>>   * INLINE routine (but not the macro - note the subtle difference)
>>   * casts away the 'volatile'...
>>   *
>>   * Also, this is only necessary because we call no other routine
>>   * from this piece of code, hence the compiler thinks it's safe to
>>   * cache *header!!
>>   *
>>   *  Till Straumann, 7/2003 (gcc-3.2.2 -O4 on powerpc)
>>   *
>>   */
>>  for ( after = (Watchdog_Control *) ((volatile Chain_Control 
>>*)header)->first ;
>>        ;
>>        after = _Watchdog_Next( after ) ) {
>>
>>     if ( delta_interval == 0 || !_Watchdog_Next( after ) )
>>       break;
>>
>>     if ( delta_interval < after->delta_interval ) {
>>       after->delta_interval -= delta_interval;
>>       break;
>>     }
>>
>>     delta_interval -= after->delta_interval;
>>
>>     /*
>>      *  If you experience problems comment out the _ISR_Flash line.  
>>      *  3.2.0 was the first release with this critical section 
>>redesigned.
>>      *  Under certain circumstances, the PREVIOUS critical section 
>>algorithm
>>      *  used around this flash point allowed interrupts to execute
>>      *  which violated the design assumptions.  The critical section 
>>      *  mechanism used here WAS redesigned to address this.
>>      */
>>
>>     _ISR_Flash( level );
>>
>>     if ( the_watchdog->state != WATCHDOG_BEING_INSERTED ) {
>>       goto exit_insert;
>>     }
>>
>>     if ( _Watchdog_Sync_level > insert_isr_nest_level ) {
>>       _Watchdog_Sync_level = insert_isr_nest_level;
>>       goto restart;
>>     }
>>  }
>>
>>  _Watchdog_Activate( the_watchdog );
>>
>>  the_watchdog->delta_interval = delta_interval;
>>
>>  _Chain_Insert_unprotected( after->Node.previous, &the_watchdog->Node
> 
> 
>>);
>>
>>  the_watchdog->start_time = _Watchdog_Ticks_since_boot;
>>
>>exit_insert:
>>  _Watchdog_Sync_level = insert_isr_nest_level;
>>  _Watchdog_Sync_count--;
>>abort_insert: /*AS: line modified*/
>>  _ISR_Enable( level );
>>}
>>
>>------------------------------------------------------------
> 
> 
> 


-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
    Support Available             (256) 722-9985




More information about the users mailing list