Complete stall: root causes and diagnosis
tgaggstatter at gemini.edu
Wed Mar 27 02:02:11 UTC 2019
We are running RTEMS 4.10.2 on a MVME2700 and on a MVME6100 CPU.
The system is up and running (we are using EPICS and several VME interface
cards) for a long period, up to 2 full months, with no faults, but suddenly
it completely stalls - no error message, no stacktrace, nothing, I cannot
connect even on the serial port. This happens at a very irregular rate,
sometimes once a month and sometimes 5 stalls in a couple of hours. The
only way to recover is doing a reset of the CPU, this happens on both CPU
I have 2 questions about this issue:
A) Did something like this happen to any of you? What was the root cause of
the stalls and how did you figure it out?
B) Is there a way we can somehow get out of this situation to diagnose it,
ideally getting the stacktrace of the halted threads? Is there some kind of
non-maskable interrupt I can send to make a postmortem diagnosis w/o
needing to reboot?
Any ideas are welcome!
With best regards,
Tim D. Gaggstatter
Gemini Observatory - AURA
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the users