Event not timing out with a wait of 1 tick (and sometimes 2)

Tue Jan 3 08:40:09 UTC 2006

Hi Steven,

Happy New Year everyone.

Looks like I made an error in the cut and paste ... ;-)  You are 
correct, it should have been delta_interval instead of delta_count.

I was sure this is where the problem was.  Did you try putting a 
breakpoint inside the "if(after->state != WATCHDOG_ACTIVE)" which would 
at least indicate that the next one up in the list has been removed?

Changing the code to make "the_watchdog->delta_interval = 1" instead of 
incrementing it should make no difference to how it worked.

The other thing to try for experimentation would be to comment out the 
ISR_Flash in the watchdog insert and see if the other referred symptom 
(in watchdogtickle.c) disappears to see if we are at least looking in 
the right area.

One other thing to check is are you setting the time from anywhere as 
this could be possibly causing the problems, although from a quick look 
it looks OK.

Stop everything!  I just had another look at the watchdoginsert.c and 
decided that I quite get it right the first time.  The after node, is 
actually going to be *after* the watchdog to be inserted, so my test is 
incorrect, we need to check whether we are at the head of the watchdog 
list with 0 in the delta_interval.  To do this, we are inserting the new 
watchdog into the chain between the after->Node.previous, (new 
watchdog), after.  Therefore we have to check whether the previous node 
is active.

So, the test should be:

    } /* End of for loop */

    _Watchdog_Activate(the_watchdog);

+   /* Check that the after watchdog is still active
+    * (not removed in the ISR_Flash via a clock isr) */
+   if( ((Watchdog_Control *)after->Node.previous)->state !=
+                  WATCHDOG_ACTIVE)
+   {
+      /* Ensure we have a delta_count of greater than 0
+       * since we haven't really started our timeout yet */
+      if(delta_interval == 0)
+         delta_interval = 1;
+   }

    the_watchdog->delta_interval = delta_interval;

If you get a chance it would be good to test if this fixes the problem 
instead.

regards,

Ian Caddy

Steven Johnson wrote:
> Hi,
> 
> We have finally been able to test Ian's fix, sorry for the delay.
> Following is a brief description of what we did, because it didnt work 
> for us, but we also had to change the code a little to do the test which 
> may have broken it if our understanding isnt correct.
> 
> We assume that Ian meant delta_interval not delta_count as there is no 
> delta_count variable (at least not where the compiler could see).
> 
> The compiler got tricky and read a bit from the condition register and 
> added it to r8 to perform the "if zero do a ++" code. So we changed it 
> to "if zero then set to 1". (so we could put a breakpoint on it)
> 
> We also had code in watchdog_tickle.c, that went, "if 
> the_watchdog->delta_interval == 0 then the_watchdog->delta_interval = 
> 1". A breakpoint on that "the_watchdog->delta_interval = 1" in 
> watchdog_tickle.c got hit several times, but the one in watchdog_insert 
> did not get hit.
> 
> Best Regards, and Merry Christmas,
> Steven Johnson
> 
> Ian Caddy wrote:
> 
>> Hi,
>>
>> I have been following this thread with interest.  I was looking at the 
>> watchdoginsert.c code (in 4.6.5) today and I can see what is causing 
>> this issue, thanks to Steven's debugging effort.
>>
>> The only way to protect against this is to ensure that if you are 
>> inserting the new watchdog at the head of the watchdog list, we need 
>> to ensure that it does not have a zero delta_interval, this watchdog 
>> is effectively inserted after the tick and so should have a count of 1 
>> still to go which will be decremented at the next clock isr.
>>
>> The simple solution would be to check after the for loop if the 
>> "after" watchdog is still active, and if not, then make sure the 
>> delta_interval is greater than zero, in watchdoginsert.c:
>>
>>    } /* End of for loop */
>>
>>    _Watchdog_Activate(the_watchdog);
>>
>> +   /* Check that the after watchdog is still active
>> +    * (not removed in the ISR_Flash via a clock isr) */
>> +   if(after->state != WATCHDOG_ACTIVE)
>> +   {
>> +      /* Ensure we have a delta_count of greater than 0
>> +       * since we haven't really started our timeout yet */
>> +      if(delta_count == 0)
>> +         detla_count++;
>> +   }
>>
>>    the_watchdog->delta_interval = delta_interval;
>>
>> So, in my conclusion, I think Steven's initial change did not loose 
>> any watchdog counts and in fact effectively provides the same result 
>> as this code, but at least with this code, we know where it is coming 
>> from rather than a solution to an unknown problem.
>>
>> I hope this is correct and makes sense to everyone.
>>
>> regards,
>>
>> Ian Caddy
>>
>>
>>
>> Steven Johnson wrote:
>>
>>> Hi Joel,
>>>
>>>>
>>>> Since you are hacking on the source anyway, you could add a variable 
>>>> which is set to note that you are at the flash and clear it when 
>>>> interrupts are redisabled.  Then when the 0 at the head of the delta 
>>>> chain occurs there is a bread crumb.  Several places could be marked 
>>>> this way.
>>>>
>>>> When you get the fault, check _ISR_Nest_level and check it.
>>>
>>>
>>>
>>>
>>>
>>> We did the following test:
>>>
>>> The ISR_Flash macro was modified to set "_ISR_In_Flash" on entry and 
>>> clear it on exit.
>>>
>>> We added the following code to the watchdog tickle:
>>>
>>> if (_ISR_Nest_level > 1)
>>>    _ISR_Nest_Count++;
>>>
>>> if (prev_in_flash)
>>> {
>>>    if (!the_watchdog->delta_interval)
>>>    {
>>>        the_watchdog->delta_interval = 1;
>>>    }
>>>    prev_in_flash = 0;
>>> }
>>>
>>>
>>> if (_ISR_In_Flash)
>>>    prev_in_flash = 1;
>>>
>>> The results are these:
>>>
>>> 1. A breakpoint on the the _ISR_Nest_Count++ never triggered.
>>> 2. A breakpoint on "prev_in_flash = 1" triggered every so often.
>>> 3. A breakpoint on "the_watchdog->delta_interval = 1;" also triggered
>>>
>>> So this seems to prove that what is happening is that interrupts are 
>>> being enabled after the delta_interval is being set to zero, but 
>>> before it is added to the watchdog chain.  An ISR occurs (but not a 
>>> nested one) that removes all the entries from the head of the 
>>> watchdog chain.  When the ISR returns, the zero calculated delta is 
>>> added, but instead of being after the entries in the chain it is 
>>> calculated on it is added at the head because all of the entries it 
>>> was calculated relative to have been removed. Which causes the next 
>>> watchdog tickle (following the one that occured during the flash) to 
>>> remove the head with 0 in it, decrement to 2^32-1, and exhibit our 
>>> fault.
>>>
>>> I dont use simulators, but i imagine you could force this behaviour 
>>> by careful control of the simulator to exhibit these characteristics 
>>> for verification.
>>>
>>> To also answer Jennifer,
>>>
>>>> On most of the PowerPC boards using new interrupt processing, 
>>>> interrupts
>>>> are turned back on in C_dispatch_irq_handler allowing nested 
>>>> interrupts. If this is occuring in your bsp, you may want to stub 
>>>> out the sections
>>>> that do this and see if the problem goes away.
>>>>
>>> We dont actually use either the old or new PowerPC interrupt 
>>> processing in RTEMS, i did my own interrupt processing (loosely based 
>>> on the old powerpc interrupt processing) for my target years ago, and 
>>> have stuck with it cause its works fine for me and ive tuned it to my 
>>> liking for the mpc860 irq's.  Which was why i was pretty sure we 
>>> shouldnt have nested IRQ's.  The above test indicates that we arent 
>>> getting nested IRQ's in this instance, as _ISR_Nest_Count is never 
>>> incremented.
>>>
>>> Hope that helps.
>>>
>>> Steven Johnson
>>>
>>
> 

-- 
Ian Caddy
Goanna Technologies Pty Ltd
+61 8 9221 1860