Critical Section: rtems_timer_fire_after, rtems_timer_server_fire_after
Joel Sherrill <joel@OARcorp.com>
joel.sherrill at OARcorp.com
Thu Jun 16 18:42:04 UTC 2005
Can you provide the patch in diff format?
It would be safer to apply that way.
Thanks.
--joel
Andrew Sinclair wrote:
> Thanks Joel. I didn't expect a quick reply on this one. It took me some
> time to understand what was going on, and I appreciate everyone is busy.
> I hope you are feeling better.
>
> I agree with all your discussion points. I agree that the higher
> interrupt is the one that should ultimately set the timer, and both
> attempts by the low and high priority tasks to set the timer should be
> classed as a success.
>
>
> -----Original Message-----
> From: Joel Sherrill <joel at OARcorp.com>
> [mailto:joel.sherrill at OARcorp.com]
> Sent: Saturday, 11 June 2005 4:15 AM
> To: Andrew Sinclair
> Cc: rtems-users at rtems.com
> Subject: Re: Critical Section: rtems_timer_fire_after,
> rtems_timer_server_fire_after
>
> Andrew Sinclair wrote:
>
>>Hello there,
>>
>>I believe I have encountered a critical section problem with
>>rtems_timer_fire_after and rtems_timer_server_fire_after. According to
>
>
>>previous discussions on the mailing list it is acceptable to call
>>these during interrupts. There is a case I have found where calling
>>these will cause issues. I am using release 4.6.2.
>>
>>If a task or low priority interrupt modify the same timer as a high
>>priority interrupt, then corruption of the timer chain can occur.
>>
>>For sake of simplicity, I have listed the general algorithm in RTEMS
>>below.
>>
>>Consider a task/low priority interrupt calling rtems_timer_fire_after.
>>Inside the marked critical section, a high priority interrupt may
>>occur and insert the same timer into the chain without the task/lower
>>priority interrupt having knowledge. When the task/lower priority
>>interrupt resumes, it blindly modifies watchdog->state back to
>>inactive, and reinserts the timer. This usually results in the timer
>>chain previous and next pointers forming a loop, and ultimately
>>locking up the device when another timer must be inserted.
>>
>>Task
>> \-->rtems_timer_fire_after
>> \-->Watchdog_Remove
>> \--> disable interrrupts
>> \--> remove timer if WATCHDOG_ACTIVE or stop if being
>>inserted
>> \--> watchdog->state=WATCHDOG_INACTIVE
>> \--> enable interrupts
>>[critical section start]
>> \-->Watchdog_Initialize
>> \--> watchdog->state=WATCHDOG_INACTIVE
>> \-->Watchdog_Insert_ticks
>> \-->Watchdog_Insert
>> \-->watchdog->state=WATCHDOG_BEING_INSERTED
>>[critical section end]
>> \-->disable interrupts
>> \-->while in timer chain
>> \-->if at insert position exit
>> \-->flash interrupts
>> \-->check if timer has been touched
>> \-->insert timer
>> \-->enable interrupts
>>
>>My solution to this problem is listed below involves modification to
>>rems_timer_fire_after, rtems_timer_fire_after, and Watchdog_Insert.
>>
>>Before watchdog->state is modified, I check disable interrupts and
>>check if it has been modified. If it has been modified, then abort.
>>
>>I have listed relevant sections of code I hacked at below. Testing so
>>far appears to have solved this issue. I see now that I probably
>>should return RTEMS_INTERNAL_ERROR on abort.
>>Can anyone see problems with the changes I have made?
>
>
> It took me some time to mentally digest this and I think your changes
> are OK. Once the discussion is done, you need to file a PR with a
> patch.
>
> The question is more of a desired behavior.
>
> + Should the interrupted operation be thrown away? Or should
> the watchdog state be "initialzed atomically to being inserted"
> and the interrupting operation return imemdiately. Now the
> interrupting operation trumps the interrupted one. I think this
> is probably OK logically because it should be logically the same
> as if the 1st completed and the 2nd then ran to completion.
>
> + Should the interrupted operation return successful or something
> like invalid state? Successful is OK if we take the "2nd operation
> trumps" view.
>
> + I don't like calling it abort just because it sounds like an error.
> It could be called something like insert_interrupted.
>
> + How about an early exit rather than a goto?
>
> Sorry for the delay. This really was hard to think about -- especially
> with the allergies I had going on last week.
>
>
>>Kind Regards
>>
>>Andrew Sinclair
>>
>>------------------------------------------------------------
>>
>>rtems_status_code rtems_timer_fire_after(
>> Objects_Id id,
>> rtems_interval ticks,
>> rtems_timer_service_routine_entry routine,
>> void *user_data
>>)
>>{
>> Timer_Control *the_timer;
>> Objects_Locations location;
>> ISR_Level level; /*AS: line modified*/
>>
>> if ( ticks == 0 )
>> return RTEMS_INVALID_NUMBER;
>>
>> if ( !routine )
>> return RTEMS_INVALID_ADDRESS;
>>
>> the_timer = _Timer_Get( id, &location );
>> switch ( location ) {
>> case OBJECTS_REMOTE: /* should never return this */
>> return RTEMS_INTERNAL_ERROR;
>>
>> case OBJECTS_ERROR:
>> return RTEMS_INVALID_ID;
>>
>> case OBJECTS_LOCAL:
>> (void) _Watchdog_Remove( &the_timer->Ticker );
>>/*AS: start, perform inline initialisation with interrupts disabled,
>>abort if watchdog touched*/
>> _ISR_Disable( level );
>> if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
>> goto timer_fire_after_abort_insert;
>> the_timer->the_class = TIMER_INTERVAL;
>> _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
>
>
>>);
>> _ISR_Enable( level );
>>/*AS: end*/
>>
>> _Watchdog_Insert_ticks( &the_timer->Ticker, ticks );
>> _Thread_Enable_dispatch();
>> return RTEMS_SUCCESSFUL;
>> }
>>
>> return RTEMS_INTERNAL_ERROR; /* unreached - only to remove
>
> warnings
>
>>*/
>>/*AS: start*/
>>timer_fire_after_abort_insert:
>> _ISR_Enable( level );
>> _Thread_Enable_dispatch();
>> return RTEMS_SUCCESSFUL;
>>/*AS: end*/
>>}
>>
>>------------------------------------------------------------
>>
>>rtems_status_code rtems_timer_server_fire_after(
>> Objects_Id id,
>> rtems_interval ticks,
>> rtems_timer_service_routine_entry routine,
>> void *user_data
>>)
>>{
>> Timer_Control *the_timer;
>> Objects_Locations location;
>> extern Chain_Control _Timer_Ticks_chain;
>> ISR_Level level; /*AS: line modified*/
>>
>> if ( !_Timer_Server )
>> return RTEMS_INCORRECT_STATE;
>>
>> if ( !routine )
>> return RTEMS_INVALID_ADDRESS;
>>
>> if ( ticks == 0 )
>> return RTEMS_INVALID_NUMBER;
>>
>> the_timer = _Timer_Get( id, &location );
>> switch ( location ) {
>> case OBJECTS_REMOTE: /* should never return this */
>> return RTEMS_INTERNAL_ERROR;
>>
>> case OBJECTS_ERROR:
>> return RTEMS_INVALID_ID;
>>
>> case OBJECTS_LOCAL:
>> (void) _Watchdog_Remove( &the_timer->Ticker );
>>
>>/*AS: start, perform inline initialisation with interrupts disabled,
>>abort if watchdog touched*/
>> _ISR_Disable( level );
>> if (the_timer->Ticker.state != WATCHDOG_INACTIVE)
>> goto timer_server_fire_after_abort_insert;
>> the_timer->the_class = TIMER_INTERVAL_ON_TASK;
>> _Watchdog_Initialize( &the_timer->Ticker, routine, id, user_data
>
>
>>);
>> the_timer->Ticker.initial = ticks;
>> _ISR_Enable( level );
>>/*AS: end*/
>> _Timer_Server_stop_ticks_timer();
>> _Timer_Server_process_ticks_chain();
>> _Watchdog_Insert( &_Timer_Ticks_chain, &the_timer->Ticker );
>> _Timer_Server_reset_ticks_timer();
>> _Thread_Enable_dispatch();
>> return RTEMS_SUCCESSFUL;
>> }
>>
>> return RTEMS_INTERNAL_ERROR; /* unreached - only to remove
>
> warnings
>
>>*/
>>/*AS: start*/
>>timer_server_fire_after_abort_insert:
>> _ISR_Enable( level );
>> _Thread_Enable_dispatch();
>> return RTEMS_SUCCESSFUL;
>>/*AS: end*/
>>}
>>
>>------------------------------------------------------------
>>
>>void _Watchdog_Insert(
>> Chain_Control *header,
>> Watchdog_Control *the_watchdog
>>)
>>{
>> ISR_Level level;
>> Watchdog_Control *after;
>> unsigned32 insert_isr_nest_level;
>> Watchdog_Interval delta_interval;
>>
>>
>> insert_isr_nest_level = _ISR_Nest_level;
>>/*AS: start , check if watchdog has been touched, and abort if
>>necessary*/
>> _ISR_Disable( level );
>> if (the_watchdog->state != WATCHDOG_INACTIVE)
>> goto abort_insert;
>>/*AS: end*/
>>
>> the_watchdog->state = WATCHDOG_BEING_INSERTED;
>>
>>
>> _Watchdog_Sync_count++;
>>
>>restart:
>> delta_interval = the_watchdog->initial;
>>
>> /*
>> * We CANT use _Watchdog_First() here, because a TICK interrupt
>> * could modify the chain during the _ISR_Flash() below. Hence,
>> * the header is pointing to volatile data. The _Watchdog_First()
>> * INLINE routine (but not the macro - note the subtle difference)
>> * casts away the 'volatile'...
>> *
>> * Also, this is only necessary because we call no other routine
>> * from this piece of code, hence the compiler thinks it's safe to
>> * cache *header!!
>> *
>> * Till Straumann, 7/2003 (gcc-3.2.2 -O4 on powerpc)
>> *
>> */
>> for ( after = (Watchdog_Control *) ((volatile Chain_Control
>>*)header)->first ;
>> ;
>> after = _Watchdog_Next( after ) ) {
>>
>> if ( delta_interval == 0 || !_Watchdog_Next( after ) )
>> break;
>>
>> if ( delta_interval < after->delta_interval ) {
>> after->delta_interval -= delta_interval;
>> break;
>> }
>>
>> delta_interval -= after->delta_interval;
>>
>> /*
>> * If you experience problems comment out the _ISR_Flash line.
>> * 3.2.0 was the first release with this critical section
>>redesigned.
>> * Under certain circumstances, the PREVIOUS critical section
>>algorithm
>> * used around this flash point allowed interrupts to execute
>> * which violated the design assumptions. The critical section
>> * mechanism used here WAS redesigned to address this.
>> */
>>
>> _ISR_Flash( level );
>>
>> if ( the_watchdog->state != WATCHDOG_BEING_INSERTED ) {
>> goto exit_insert;
>> }
>>
>> if ( _Watchdog_Sync_level > insert_isr_nest_level ) {
>> _Watchdog_Sync_level = insert_isr_nest_level;
>> goto restart;
>> }
>> }
>>
>> _Watchdog_Activate( the_watchdog );
>>
>> the_watchdog->delta_interval = delta_interval;
>>
>> _Chain_Insert_unprotected( after->Node.previous, &the_watchdog->Node
>
>
>>);
>>
>> the_watchdog->start_time = _Watchdog_Ticks_since_boot;
>>
>>exit_insert:
>> _Watchdog_Sync_level = insert_isr_nest_level;
>> _Watchdog_Sync_count--;
>>abort_insert: /*AS: line modified*/
>> _ISR_Enable( level );
>>}
>>
>>------------------------------------------------------------
>
>
>
--
Joel Sherrill, Ph.D. Director of Research & Development
joel at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list