Event not timing out with a wait of 1 tick (and sometimes 2)

Fri Dec 23 19:36:39 UTC 2005

Hi,

We have finally been able to test Ian's fix, sorry for the delay. 

Following is a brief description of what we did, because it didnt work 
for us, but we also had to change the code a little to do the test which 
may have broken it if our understanding isnt correct.

We assume that Ian meant delta_interval not delta_count as there is no 
delta_count variable (at least not where the compiler could see).

The compiler got tricky and read a bit from the condition register and 
added it to r8 to perform the "if zero do a ++" code. So we changed it 
to "if zero then set to 1". (so we could put a breakpoint on it)

We also had code in watchdog_tickle.c, that went, "if 
the_watchdog->delta_interval == 0 then the_watchdog->delta_interval = 
1". A breakpoint on that "the_watchdog->delta_interval = 1" in 
watchdog_tickle.c got hit several times, but the one in watchdog_insert 
did not get hit.

Best Regards, and Merry Christmas,
Steven Johnson

Ian Caddy wrote:

> Hi,
>
> I have been following this thread with interest.  I was looking at the 
> watchdoginsert.c code (in 4.6.5) today and I can see what is causing 
> this issue, thanks to Steven's debugging effort.
>
> The only way to protect against this is to ensure that if you are 
> inserting the new watchdog at the head of the watchdog list, we need 
> to ensure that it does not have a zero delta_interval, this watchdog 
> is effectively inserted after the tick and so should have a count of 1 
> still to go which will be decremented at the next clock isr.
>
> The simple solution would be to check after the for loop if the 
> "after" watchdog is still active, and if not, then make sure the 
> delta_interval is greater than zero, in watchdoginsert.c:
>
>    } /* End of for loop */
>
>    _Watchdog_Activate(the_watchdog);
>
> +   /* Check that the after watchdog is still active
> +    * (not removed in the ISR_Flash via a clock isr) */
> +   if(after->state != WATCHDOG_ACTIVE)
> +   {
> +      /* Ensure we have a delta_count of greater than 0
> +       * since we haven't really started our timeout yet */
> +      if(delta_count == 0)
> +         detla_count++;
> +   }
>
>    the_watchdog->delta_interval = delta_interval;
>
> So, in my conclusion, I think Steven's initial change did not loose 
> any watchdog counts and in fact effectively provides the same result 
> as this code, but at least with this code, we know where it is coming 
> from rather than a solution to an unknown problem.
>
> I hope this is correct and makes sense to everyone.
>
> regards,
>
> Ian Caddy
>
>
>
> Steven Johnson wrote:
>
>> Hi Joel,
>>
>>>
>>> Since you are hacking on the source anyway, you could add a variable 
>>> which is set to note that you are at the flash and clear it when 
>>> interrupts are redisabled.  Then when the 0 at the head of the delta 
>>> chain occurs there is a bread crumb.  Several places could be marked 
>>> this way.
>>>
>>> When you get the fault, check _ISR_Nest_level and check it.
>>
>>
>>
>>
>> We did the following test:
>>
>> The ISR_Flash macro was modified to set "_ISR_In_Flash" on entry and 
>> clear it on exit.
>>
>> We added the following code to the watchdog tickle:
>>
>> if (_ISR_Nest_level > 1)
>>    _ISR_Nest_Count++;
>>
>> if (prev_in_flash)
>> {
>>    if (!the_watchdog->delta_interval)
>>    {
>>        the_watchdog->delta_interval = 1;
>>    }
>>    prev_in_flash = 0;
>> }
>>
>>
>> if (_ISR_In_Flash)
>>    prev_in_flash = 1;
>>
>> The results are these:
>>
>> 1. A breakpoint on the the _ISR_Nest_Count++ never triggered.
>> 2. A breakpoint on "prev_in_flash = 1" triggered every so often.
>> 3. A breakpoint on "the_watchdog->delta_interval = 1;" also triggered
>>
>> So this seems to prove that what is happening is that interrupts are 
>> being enabled after the delta_interval is being set to zero, but 
>> before it is added to the watchdog chain.  An ISR occurs (but not a 
>> nested one) that removes all the entries from the head of the 
>> watchdog chain.  When the ISR returns, the zero calculated delta is 
>> added, but instead of being after the entries in the chain it is 
>> calculated on it is added at the head because all of the entries it 
>> was calculated relative to have been removed. Which causes the next 
>> watchdog tickle (following the one that occured during the flash) to 
>> remove the head with 0 in it, decrement to 2^32-1, and exhibit our 
>> fault.
>>
>> I dont use simulators, but i imagine you could force this behaviour 
>> by careful control of the simulator to exhibit these characteristics 
>> for verification.
>>
>> To also answer Jennifer,
>>
>>> On most of the PowerPC boards using new interrupt processing, 
>>> interrupts
>>> are turned back on in C_dispatch_irq_handler allowing nested 
>>> interrupts. If this is occuring in your bsp, you may want to stub 
>>> out the sections
>>> that do this and see if the problem goes away.
>>>
>> We dont actually use either the old or new PowerPC interrupt 
>> processing in RTEMS, i did my own interrupt processing (loosely based 
>> on the old powerpc interrupt processing) for my target years ago, and 
>> have stuck with it cause its works fine for me and ive tuned it to my 
>> liking for the mpc860 irq's.  Which was why i was pretty sure we 
>> shouldnt have nested IRQ's.  The above test indicates that we arent 
>> getting nested IRQ's in this instance, as _ISR_Nest_Count is never 
>> incremented.
>>
>> Hope that helps.
>>
>> Steven Johnson
>>
>