<div dir="ltr"><div class="gmail_default" style="font-size:large">Hi Andrew,</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">Thanks for your email. It helped me learn a little more. We have a bancomm635 timing board installed, physically.</div><div class="gmail_default" style="font-size:large">However we've stopped initializing it some time ago in our startup. So the interrupt handler isn't configured as you pointed out. I know what to do to prevent this from happening again. I think the consequence of this is extreme (our system stopped working).</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">I haven't tracked down vector 0x45 yet. But we also have 2 GreenSprings IPAC carriers for our canbus cards that can generate interrupts. These isr's should all be configured. This uses the drvTip810 driver. <a href="https://github.com/epics-modules/ipac/tree/2.14">https://github.com/epics-modules/ipac/tree/2.14</a></div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">We can run the t810Report 3 if we see this happen again.</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large">-Matt</div><div class="gmail_default" style="font-size:large"><br></div><div class="gmail_default" style="font-size:large"><span style="font-family:monospace"><span style="color:rgb(0,0,0)">$ egrep -rn --include="*.c" BC635VEC bancomm              </span><br><span style="color:rgb(178,24,178)">bancomm/bancommApp/src/drvBc635.c</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(24,178,24)">121</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(0,0,0)">#define </span><span style="font-weight:bold;color:rgb(255,84,84)">BC635VEC</span><span style="color:rgb(0,0,0)">    0x40

</span><br><span style="color:rgb(178,24,178)">bancomm/bancommApp/src/drvBc635.c</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(24,178,24)">702</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(0,0,0)">        if( (status = devConnectInterruptVME(</span><span style="font-weight:bold;color:rgb(255,84,84)">BC635VEC</span><span style="color:rgb(0,0,0)">, isr_bc635, NULL) ) != OK )

</span><br><span style="color:rgb(178,24,178)">bancomm/bancommApp/src/drvBc635.c</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(24,178,24)">707</span><span style="color:rgb(24,178,178)">:</span><span style="color:rgb(0,0,0)">        pbc635->vector = </span><span style="font-weight:bold;color:rgb(255,84,84)">BC635VEC</span><span style="color:rgb(0,0,0)">;      /* Interrupt vector */</span><br>

<br></span></div><div class="gmail_default" style="font-size:large"><span style="font-family:monospace"><br></span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 11, 2021 at 5:59 PM Johnson, Andrew N. <<a href="mailto:anj@anl.gov">anj@anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Matt,<br>

<br>

You appear to have had 2 separate unconfigured interrupts, which triggered about 5 minutes apart. Editing the log slightly to make that clearer:<br>

<br>

> <8/3/2021 8:00:34 hst>vmeTsi148 ISR: ERROR: no handler registered (level 2) IACK 0x00000045 -- DISABLING level 2 <br>

> <8/3/2021 8:05:14 hst>vmeTsi148 ISR: ERROR: no handler registered (level 4) IACK 0x00000040 -- DISABLING level 4<br>

<br>

<br>

I would guess that those IACK numbers 0x45 and 0x40 are VME interrupt vectors, so they probably came from one or more VME cards in the crate. My VxWorks BSPs have routines that let me see the complete interrupt vector table, but I don’t know if the Beatnik BSP has anything similar. You should have a record of what vectors are used for each VME card that can generate interrupts, so check whether those interrupts should have registered handlers or not.<br>

<br>

It’s also possible that these interrupts might have been generated by the Tsi148 chip itself. In many VxWorks VME BSPs the on-board drivers route their interrupts through the same vector table even though there’s no specific need to do it that way. There are VMEbus SYSFail and PowerFail interrupts for example that the Tempe (Tsi148) chip can generate, although none of the chip interrupts should have been enabled without registering a handler for them first. SYSfail can be asserted by another VME board so that wouldn’t give you a definitive answer if it turned out to be that.<br>

<br>

Hope this gives you some clues,<br>

<br>

- Andrew<br>

<br>

<br>

<br>

On Mar 11, 2021, at 9:35 PM, Matt Rippa via Tech-talk <<a href="mailto:tech-talk@aps.anl.gov" target="_blank">tech-talk@aps.anl.gov</a>> wrote:<br>

> <br>

> It looks like the IRQEntry vector triggering this interrupt was null. I'm not sure what condition would arise leading to this.<br>

> Maybe memory corruption or a bus error.<br>

> <br>

> <a href="https://git.rtems.org/rtems/tree/c/src/lib/libbsp/shared/vmeUniverse/vmeTsi148.c?h=4.10.2#n1562" rel="noreferrer" target="_blank">https://git.rtems.org/rtems/tree/c/src/lib/libbsp/shared/vmeUniverse/vmeTsi148.c?h=4.10.2#n1562</a><br>

> <br>

> On Thu, Mar 11, 2021 at 2:42 PM Matt Rippa <<a href="mailto:mrippa@gemini.edu" target="_blank">mrippa@gemini.edu</a>> wrote:<br>

> Hello,<br>

> <br>

> Our Primary Mirror control system has a beatnik bsp running on a mvme6100 with RTEMS4.10.2 and EPICS 3.14.12.8 (which understandably is not actively supported). We've experienced a single event that we've never seen before. vmeTsi148 ISR: ERROR: no handler <br>

> <br>

> After this it appears all interrupts were disabled and the VME bus communications are down. This system is normally well behaved with uptimes measured in months. In 2020 we had an uptime of 301 days!<br>

> <br>

> Has anyone seen an error like this on the beatnik bsp?<br>

> <br>

> Subsequent to the error which corresponds closely with a casr command, we see dbScan warnings in our logs. These persisted several times per minute until the system was reboot.<br>

> <br>

> Thanks for any insight.<br>

> <br>

> -Matt Rippa <br>

> <br>

> <br>

> <8/3/2021 8:00:32 hst>pcs-mk-ioc> casr <br>

> <8/3/2021 8:00:33 hst>Channel Access Server V4.13 <br>

> <8/3/2021 8:00:33 hst>Connected circuits: <br>

> <8/3/2021 8:00:33 hst>TCP xxxxxxxxxxxxxxxxxxxx, V4.13, 6 Channels, Priority=0 <br>

> <8/3/2021 8:00:33 hst>TCP xxxxxxxxxxxxxxxxxxxx, V4.13, 134 Channels, Priority=80<br>

> <8/3/2021 8:00:33 hst>TCP , V4.13, 1 Channels, Priority=0 <br>

> <8/3/2021 8:00:33 hst>TCP , V4.13, 5 Channels, Priority=0 <br>

> <8/3/2021 8:00:33 hst>TCP , V4.13, 8 Channels, Priority=0 <br>

> <8/3/2021 8:00:33 hst>TCP , V4.13, 60 Channels, Priority=0 <br>

> <8/3/2021 8:00:33 hst>TCP , V4.13, 9 Channels, Priority=0 <br>

> <8/3/2021 8:00:34 hst>TCP , V4.13, 9 Channels, Priority=0 <br>

> <8/3/2021 8:00:34 hst>TCP , V4.13, 71 Channels, Priority=0 <br>

> <8/3/2021 8:00:34 hst>TCP , V4.13, 5 Channels, Priority=0 <br>

> <8/3/2021 8:00:34 hst>TCP , V4.13, 359 Channels, Priority=20 <br>

> <8/3/2021 8:00:34 hst>TCP , 741 Channels, Priority=20 <br>

> <8/3/2021 8:00:34 hst>TCP , V4.13, 5 Channels, Priority=20 <br>

> <8/3/2021 8:00:34 hst>pcs-mk-ioc> vmeTsi148 ISR: ERROR: no handler registered (level 2) IACK 0x00000045 -- DISABLING level 2 <br>

> <8/3/2021 8:05:14 hst>vmeTsi148 ISR: ERROR: no handler registered (level 4) IACK 0x00000040 -- DISABLING level 4<br>

> <8/3/2021 8:34:54 hst>dbScan warning from '.2 second' scan thread: <br>

> <8/3/2021 8:34:54 hst>        Scan processing averages 0.300 seconds (0.300 .. 0.300). <br>

> <8/3/2021 8:34:54 hst>        Over-runs have now happened 10 times in a row. <br>

> <8/3/2021 8:34:54 hst>        To fix this, move some records to a slower scan rate. <br>

> <8/3/2021 8:34:54 hst> <br>

> <8/3/2021 8:35:09 hst>dbScan warning from '.2 second' scan thread: <br>

> <8/3/2021 8:35:09 hst>        Scan processing averages 0.300 seconds (0.300 .. 0.300). <br>

> <8/3/2021 8:35:09 hst>        Over-runs have now happened 10 times in a row. <br>

> <8/3/2021 8:35:09 hst>        To fix this, move some records to a slower scan rate. <br>

> <8/3/2021 8:35:09 hst><br>

> ...<br>

> <br>

<br>

-- <br>

Complexity comes for free, simplicity you have to work for.<br>

<br>

</blockquote></div>