Critical Section: rtems_timer_fire_after, rtems_timer_server_fire_after
Andrew Sinclair
Andrew.Sinclair at elprotech.com
Mon Jun 13 23:26:15 UTC 2005
Thanks Joel. I didn't expect a quick reply on this one. It took me some
time to understand what was going on, and I appreciate everyone is busy.
I hope you are feeling better.
I agree with all your discussion points. I agree that the higher
interrupt is the one that should ultimately set the timer, and both
attempts by the low and high priority tasks to set the timer should be
classed as a success.
-----Original Message-----
From: Joel Sherrill <joel at OARcorp.com>
[mailto:joel.sherrill at OARcorp.com]
Sent: Saturday, 11 June 2005 4:15 AM
To: Andrew Sinclair
Cc: rtems-users at rtems.com
Subject: Re: Critical Section: rtems_timer_fire_after,
rtems_timer_server_fire_after
Andrew Sinclair wrote:
> Hello there,
>
> I believe I have encountered a critical section problem with
> rtems_timer_fire_after and rtems_timer_server_fire_after. According to
> previous discussions on the mailing list it is acceptable to call
> these during interrupts. There is a case I have found where calling
> these will cause issues. I am using release 4.6.2.
>
> If a task or low priority interrupt modify the same timer as a high
> priority interrupt, then corruption of the timer chain can occur.
>
> For sake of simplicity, I have listed the general algorithm in RTEMS
> below.
>
> Consider a task/low priority interrupt calling rtems_timer_fire_after.
> Inside the marked critical section, a high priority interrupt may
> occur and insert the same timer into the chain without the task/lower
> priority interrupt having knowledge. When the task/lower priority
> interrupt resumes, it blindly modifies watchdog->state back to
> inactive, and reinserts the timer. This usually results in the timer
> chain previous and next pointers forming a loop, and ultimately
> locking up the device when another timer must be inserted.
>
> Task
> \-->rtems_timer_fire_after
> \-->Watchdog_Remove
> \--> disable interrrupts
> \--> remove timer if WATCHDOG_ACTIVE or stop if being
> inserted
> \--> watchdog->state=WATCHDOG_INACTIVE
> \--> enable interrupts
> [critical section start]
> \-->Watchdog_Initialize
> \--> watchdog->state=WATCHDOG_INACTIVE
> \-->Watchdog_Insert_ticks
> \-->Watchdog_Insert
> \-->watchdog->state=WATCHDOG_BEING_INSERTED
> [critical section end]
> \-->disable interrupts
> \-->while in timer chain
> \-->if at insert position exit
> \-->flash interrupts
> \-->check if timer has been touched
> \-->insert timer
> \-->enable interrupts
>
> My solution to this problem is listed below involves modification to
> rems_timer_fire_after, rtems_timer_fire_after, and Watchdog_Insert.
>
> Before watchdog->state is modified, I check disable interrupts and
> check if it has been modified. If it has been modified, then abort.
>
> I have listed relevant sections of code I hacked at below. Testing so
> far appears to have solved this issue. I see now that I probably
> should return RTEMS_INTERNAL_ERROR on abort.
> Can anyone see problems with the changes I have made?
It took me some time to mentally digest this and I think your changes
are OK. Once the discussion is done, you need to file a PR with a
patch.
The question is more of a desired behavior.
+ Should the interrupted operation be thrown away? Or should
the watchdog state be "initialzed atomically to being inserted"
and the interrupting operation return imemdiately. Now the
interrupting operation trumps the interrupted one. I think this
is probably OK logically because it should be logically the same
as if the 1st completed and the 2nd then ran to completion.
+ Should the interrupted operation return successful or something
like invalid state? Successful is OK if we take the "2nd operation
trumps" view.
+ I don't like calling it abort just because it sounds like an error.
It could be called something like insert_interrupted.
+ How about an early exit rather than a goto?
Sorry for the delay. This really was hard to think about -- especially
with the allergies I had going on last week.
> Kind Regards
>
> Andrew Sinclair
>
> ------------------------------------------------------------
>
> rtems_status_code rtems_timer_fire_after(
> Objects_Id id,
> rtems_interval ticks,
> rtems_timer_service_routine_entry routine,
> void *user_data
> )
> {
> Timer_Control *the_timer;
> Objects_Locations location;
> ISR_Level level; /*AS: line modified*/
>
> if ( ticks == 0 )
> return RTEMS_INVALID_NUMBER;
>
> if ( !routine )
> return RTEMS_INVALID_ADDRESS;
>
> the_timer = _Timer_Get( id, &location );
> switch ( location ) {
> case OBJECTS_REMOTE: /* should never return this */
> return RTEMS_INTERNAL_ERROR;
>
> case OBJECTS_ERROR:
> return RTEMS_INVALID_ID;
>
> case OBJECTS_LOCAL:
> (void) _Watchdog_Remove( &the_timer->Ticker );
> /*AS: start, perform inline initialisation with interrupts disabled,
> abort if watchdog touched*/
> _ISR_Disable( level );
> if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
> goto timer_fire_after_abort_insert;
> the_timer->the_class = TIMER_INTERVAL;
> _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
> );
> _ISR_Enable( level );
> /*AS: end*/
>
> _Watchdog_Insert_ticks( &the_timer->Ticker, ticks );
> _Thread_Enable_dispatch();
> return RTEMS_SUCCESSFUL;
> }
>
> return RTEMS_INTERNAL_ERROR; /* unreached - only to remove
warnings
> */
> /*AS: start*/
> timer_fire_after_abort_insert:
> _ISR_Enable( level );
> _Thread_Enable_dispatch();
> return RTEMS_SUCCESSFUL;
> /*AS: end*/
> }
>
> ------------------------------------------------------------
>
> rtems_status_code rtems_timer_server_fire_after(
> Objects_Id id,
> rtems_interval ticks,
> rtems_timer_service_routine_entry routine,
> void *user_data
> )
> {
> Timer_Control *the_timer;
> Objects_Locations location;
> extern Chain_Control _Timer_Ticks_chain;
> ISR_Level level; /*AS: line modified*/
>
> if ( !_Timer_Server )
> return RTEMS_INCORRECT_STATE;
>
> if ( !routine )
> return RTEMS_INVALID_ADDRESS;
>
> if ( ticks == 0 )
> return RTEMS_INVALID_NUMBER;
>
> the_timer = _Timer_Get( id, &location );
> switch ( location ) {
> case OBJECTS_REMOTE: /* should never return this */
> return RTEMS_INTERNAL_ERROR;
>
> case OBJECTS_ERROR:
> return RTEMS_INVALID_ID;
>
> case OBJECTS_LOCAL:
> (void) _Watchdog_Remove( &the_timer->Ticker );
>
> /*AS: start, perform inline initialisation with interrupts disabled,
> abort if watchdog touched*/
> _ISR_Disable( level );
> if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
> goto timer_server_fire_after_abort_insert;
> the_timer->the_class = TIMER_INTERVAL_ON_TASK;
> _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
> );
> the_timer->Ticker.initial = ticks;
> _ISR_Enable( level );
> /*AS: end*/
> _Timer_Server_stop_ticks_timer();
> _Timer_Server_process_ticks_chain();
> _Watchdog_Insert( &_Timer_Ticks_chain, &the_timer->Ticker );
> _Timer_Server_reset_ticks_timer();
> _Thread_Enable_dispatch();
> return RTEMS_SUCCESSFUL;
> }
>
> return RTEMS_INTERNAL_ERROR; /* unreached - only to remove
warnings
> */
> /*AS: start*/
> timer_server_fire_after_abort_insert:
> _ISR_Enable( level );
> _Thread_Enable_dispatch();
> return RTEMS_SUCCESSFUL;
> /*AS: end*/
> }
>
> ------------------------------------------------------------
>
> void _Watchdog_Insert(
> Chain_Control *header,
> Watchdog_Control *the_watchdog
> )
> {
> ISR_Level level;
> Watchdog_Control *after;
> unsigned32 insert_isr_nest_level;
> Watchdog_Interval delta_interval;
>
>
> insert_isr_nest_level = _ISR_Nest_level;
> /*AS: start , check if watchdog has been touched, and abort if
> necessary*/
> _ISR_Disable( level );
> if (the_watchdog->state != WATCHDOG_INACTIVE)
> goto abort_insert;
> /*AS: end*/
>
> the_watchdog->state = WATCHDOG_BEING_INSERTED;
>
>
> _Watchdog_Sync_count++;
>
> restart:
> delta_interval = the_watchdog->initial;
>
> /*
> * We CANT use _Watchdog_First() here, because a TICK interrupt
> * could modify the chain during the _ISR_Flash() below. Hence,
> * the header is pointing to volatile data. The _Watchdog_First()
> * INLINE routine (but not the macro - note the subtle difference)
> * casts away the 'volatile'...
> *
> * Also, this is only necessary because we call no other routine
> * from this piece of code, hence the compiler thinks it's safe to
> * cache *header!!
> *
> * Till Straumann, 7/2003 (gcc-3.2.2 -O4 on powerpc)
> *
> */
> for ( after = (Watchdog_Control *) ((volatile Chain_Control
> *)header)->first ;
> ;
> after = _Watchdog_Next( after ) ) {
>
> if ( delta_interval == 0 || !_Watchdog_Next( after ) )
> break;
>
> if ( delta_interval < after->delta_interval ) {
> after->delta_interval -= delta_interval;
> break;
> }
>
> delta_interval -= after->delta_interval;
>
> /*
> * If you experience problems comment out the _ISR_Flash line.
> * 3.2.0 was the first release with this critical section
> redesigned.
> * Under certain circumstances, the PREVIOUS critical section
> algorithm
> * used around this flash point allowed interrupts to execute
> * which violated the design assumptions. The critical section
> * mechanism used here WAS redesigned to address this.
> */
>
> _ISR_Flash( level );
>
> if ( the_watchdog->state != WATCHDOG_BEING_INSERTED ) {
> goto exit_insert;
> }
>
> if ( _Watchdog_Sync_level > insert_isr_nest_level ) {
> _Watchdog_Sync_level = insert_isr_nest_level;
> goto restart;
> }
> }
>
> _Watchdog_Activate( the_watchdog );
>
> the_watchdog->delta_interval = delta_interval;
>
> _Chain_Insert_unprotected( after->Node.previous, &the_watchdog->Node
> );
>
> the_watchdog->start_time = _Watchdog_Ticks_since_boot;
>
> exit_insert:
> _Watchdog_Sync_level = insert_isr_nest_level;
> _Watchdog_Sync_count--;
> abort_insert: /*AS: line modified*/
> _ISR_Enable( level );
> }
>
> ------------------------------------------------------------
--
Joel Sherrill, Ph.D. Director of Research & Development
joel at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list