Scheduler bug?

Sun May 10 13:09:41 UTC 2009

Leon Pollak wrote:
> Hello, all.
>
>
> My customer reported the unit resets sometimes in the very old unit 
> running RTEMS version from 2003. They changed the cpu clock to faster 
> and the reset occurs.
>
>
> Although I do not believe to myself, but my investigations showed that 
> I very probably encounter some bug in RTEMS scheduler (see below). The 
> question is if the bug is corrected in any new version? I mean, will 
> the upgrade to 4.9 help?
>
>
> As the end customer (the customer of my customer) is a military 
> organization, they will be really upset when I will say "upgrading the 
> RTOS"...:-) without being sure...
>
>
>
So they are OK with changing the hardware clock on a tested unit and
invalidating all testing but not upgrading the software.  Any change on
a validated system is a change. 
> So, please, advice...:-)
We will have to use the collective RTEMS memory on this one.  I recall
a bug that does sound like this.

2005-08-17	Andrew Sinclair <Andrew.Sinclair at elprotech.com <mailto:Andrew.Sinclair at elprotech.com>>

	PR 807/rtems
	* rtems/src/timerfireafter.c, rtems/src/timerserverfireafter.c,
	score/src/watchdoginsert.c: Tighten critical section checks on an ISR
	using the same timer being inserted by a lower priority ISR or
	interupt task.

Does this sound like it?  It was fixed in 4.6.4 (not 4.6.2)

http://www.rtems.org/cgi-bin/cvsweb.cgi/rtems/cpukit/score/src/watchdoginsert.c
is where I found the ChangeLog entry.

This only impacted 3 files so is no more of a change than increasing the 
clock
frequency.  How is it OK to (*&% with the hardware and not with the 
software.
Change is (*^ change.

--joel
> A lot of thanks ahead.
>
>
> =============================================================================
>
>
> The problem description:
> -------------------------
> The application has only 2 tasks:
>
>
> - WD - priority 50, task to reset HW watchdog, sleeps for 10 ticks, 
> puts the reset line to high, sleeps for 10 clocks, puts the reset line 
> to low;
>
>
> - MB - priority 100, task waits for event from interrupt with 1(!!!) 
> tick timeout. If timeout occurs, it refreshes some variables and waits 
> for event again. Interrupt frequency is about 50Hz (20ms).
>
>
>
> The WD tasks stops working rather fast (30-50s) when tick=2ms.
> Debugger shows, entering rtems_clock_tick() routine and further 
> _Watchdog_Tickle() routine, that
> the_watchdog->Node.delta_interval=0xFFFFFeXX for the WD task.
>
>
> This obviously causes the HW watchdog to reset the system very soon...:-)
>
>
> -Increasing the MB task event waiting timeout even to 2 ticks seems to 
> eliminate the problem.
> -Masking the incoming interrupts seems to eliminate the problem.
> -Increasing tick significantly reduces the problem probability.
>
>
> Other changes seem not to influence the situation.
> -- 
> Leon
>
>