EPICS ioc with RTEMS on Raspberry Pi

Miroslaw Dach miroslaw.dach at gmail.com
Thu Apr 23 03:27:50 UTC 2026


Hi John,

Thanks for your e-mail and your suggestion.

Follow-up to my earlier report about the DWC OTG USB hang on RPi 3B+
with RTEMS 6. I've done extensive instrumentation and applied BCM2835-
specific errata workarounds from the Linux dwc2 driver. Here are the
complete findings:

I added diagnostic counters (#ifdef __rtems__) to the FreeBSD dwc_otg
driver, printed every 1 second from the 10ms timer callback:

- yld:   times dwc_otg_interrupt_poll_locked() hit its 16-iteration cap
- afail: times dwc_otg_host_channel_alloc() failed (no free channels)
- hok:   successful halt completions processed
- rxd:   RX data discarded (no active endpoint listening)
- stuck: channels with wait_halted=1 but allocated=0 (leaked channels)
- filt:  filter ISR entry/exit count (to detect stuck ISR)
- thr:   thread handler entry/exit count
- gi:    last GINTSTS register value

Results: channel exhaustion theory DISPROVED
Across all tests, every counter remained perfectly healthy right up to
the instant of the freeze:

  DWC_OTG t=17500: yld=0 afail=0 hok=180038 rxd=0 ch=8/16
      alloc=0 wh=0 stuck=0 filt=738178/738178 thr=7280/7280
      gi=0x04000031

- yld=0:     the 16-iteration poll loop cap was NEVER hit
- afail=0:   channel allocation NEVER failed
- stuck=0:   NO channels ever leaked (wait_halted was always cleared)
- filt balanced: filter ISR entered and exited the same number of times
- thr balanced:  thread handler entered and exited the same number of times
- gi=0x04000031: benign (host mode + TX FIFOs empty + RX FIFO non-empty)

The ~1054 halt completions/second were steady with no degradation.

Results: total CPU freeze, not software deadlock

I also tried an independent RTEMS timer (rtems_timer_fire_after) as a
heartbeat, firing outside the USB bus lock. Both the USB timer and the
independent heartbeat stopped simultaneously, confirming the entire ARM
core freezes — not just the USB subsystem.

No RTEMS fatal error handler was triggered (I installed one via
CONFIGURE_INITIAL_EXTENSIONS). This means it is NOT a standard ARM
data abort — the CPU simply stops executing, likely due to an AHB bus
lockup caused by the DWC OTG controller hardware.

BCM2835 errata workarounds applied

Based on the Linux dwc2 driver (params.c, core.c) and the Ultibo
project documentation, I applied three workarounds:

1. FIFO size cap: BCM2835 hardware reports 4096 words but only 4080
   exist. Cap sc_fifo_size to 4080*4 bytes to prevent FIFO overrun.
   (In practice, the hardware on my board reported <= 4080, so this
   cap was NOT triggered.)

2. GAHBCFG AHB burst configuration: Broadcom redefined bits [4:1] of
   GAHBCFG for AXI burst control. Linux dwc2 sets ahbcfg=0x10 for
   BCM2835. Changed from GAHBCFG_GLBLINTRMSK (0x01) to 0x11.

3. AHB idle wait after core reset: Added a loop waiting for
   GRSTCTL_AHBIDLE after GRSTCTL_CSFTRST, plus 250ms settling delay
   (Linux dwc2 uses 100+ ms).

Results with workarounds:
- Before: hangs after ~86 seconds (2 kHz tick)
- After:  hangs after ~175 seconds (2 kHz tick)

The workarounds approximately doubled the time-to-hang but did NOT
eliminate it. All software counters remained healthy throughout.

The hang is caused most probably by a combination of factors:

1. The DWC OTG controller on BCM2835/BCM2837 requires sub-125µs
   interrupt response times for USB split transaction phases (Start
   Split → Complete Split through the USB hub). The RPi 3B+ Ethernet
   goes through a USB hub (LAN7515), making every packet a split
   transaction.

2. The Linux kernel addresses this with a dedicated FIQ (Fast Interrupt
   Request) handler (dwc_otg_fiq_fsm.c) that executes complete split
   transactions in FIQ context, bypassing the normal interrupt stack.
   Without FIQ, "certain USB devices become completely unusable."

3 The FreeBSD dwc_otg driver used by RTEMS handles all split
   transactions in normal interrupt context. On RTEMS, the interrupt
   filter and thread handler run back-to-back in the interrupt server
   task (nexus_intr_with_filter in rtems-kernel-nexus.c), with no
   preemption point between them.

4 When the controller's split transaction timing is violated, it
   enters an unrecoverable state that locks the AHB bus, freezing the
   entire ARM core including UART and system timers.

The timing correlation with tick rate confirms this: higher tick rate
= more frequent scheduling = more interrupt latency jitter = faster
timing violation.

A proper fix requires most probably implementing FIQ-based split
transaction handling
in the RTEMS BSP for BCM2835/BCM2837, similar to what Linux does in
dwc_otg_fiq_fsm.c. This is a significant undertaking but is essential
for reliable USB operation on RPi 3B+ (and RPi Zero 2 W, which uses
the same SoC).

The GAHBCFG and reset sequence workarounds should also be applied as
they improve stability.

Please give me your thoughts on that. Maybe it is easier to finalise the
BSP for RPi 4 and 5?

Best Regards
Mirek

śr., 22 kwi 2026 o 13:17 John Howard <echosoft.llc at gmail.com> napisał(a):

> Fascinating. Thanks for that detailed report.
>
> You indicated USB enumerating continues running in the background.
>
> I am educated-guessing that a counter maximum is reached, and then
> mistakenly breached. I would look for a test of that counter and correct it
> from allowing greater-than comparing.
>
> Let us know how it turns out.
>
> I am developing an app for Raspberry Pi Zero 2 W (stripped-down 3B+). I
> wasn't expecting any potential problem like this.
>
> -- John
>
> > On Apr 22, 2026, at 1:25 PM, Miroslaw Dach <miroslaw.dach at gmail.com>
> wrote:
> >
> > 
> > Hi All,
> >
> > I'm running an EPICS ioc server (EPICS 7.0.10)  with RTEMS 6.2 on a
> Raspberry Pi 3B+ with rtems-libbsd  (6-freebsd-14) and
> > encountering a reproducible system hang after several minutes of
> operation.
> > Through systematic elimination testing I've narrowed the root cause to
> the
> > DWC OTG USB controller driver. I'd appreciate any recommendations on how
> > to address this. I can of course use the EPICS ioc server under linux on
> RPi but just tried to have the RTEMS - Hard real time system which is much
> more deterministic.
> > The boot time for the RPi with EPICS/RTEMS is around 7 sec which is
> extremely fast!
> >
> > Environment
> > -----------
> > - Board: Raspberry Pi 3B+ (BCM2837, boardrev a020d3)
> > - RTEMS: rtems-6.2 (ARM/ARMv4/raspberrypi2)
> > - Network stack: rtems-libbsd (RTEMS_BSD_CONFIG_BSP_CONFIG +
> RTEMS_BSD_CONFIG_INIT)
> > - Ethernet: LAN7515 USB Ethernet (muge driver, via DWC OTG)
> > - Application: EPICS IOC (but hang occurs with minimal/empty IOC as well)
> > - Console: UART serial (/dev/ttyS0)
> >
> > Symptom
> > -------
> > The system boots and runs normally, then the entire system freezes —
> > including the UART serial console (which is not USB-dependent). No crash
> > message, no stack dump — a hard hang requiring power cycle.
> >
> > The time-to-hang depends on the system tick rate:
> > - CONFIGURE_MICROSECONDS_PER_TICK=500  (2 kHz): hangs after ~2 minutes
> > - CONFIGURE_MICROSECONDS_PER_TICK=10000 (100 Hz): hangs after ~5-6
> minutes
> >
> > During the hang, the UART becomes completely unresponsive, suggesting a
> > kernel-level deadlock or interrupt handler issue rather than an
> > application-level problem.
> >
> > Elimination testing performed
> > -----------------------------
> > I systematically disabled components to isolate the cause. All tests
> below
> > used CONFIGURE_MICROSECONDS_PER_TICK=10000 (100 Hz):
> >
> > 1. Disabled all EPICS database records (no record processing) -> still
> hangs
> > 2. Disabled periodic NTP sync (no socket operations) -> still hangs
> > 3. Disabled IP configuration (no ifconfig, no route, no network traffic,
> >    but USB/DWC OTG still initialised via RTEMS_BSD_CONFIG_BSP_CONFIG)
> >    -> still hangs (~5-6 min), USB hub enumeration continues in
> background:
> >       ugen1.2: <vendor 0x0424 product 0x2514> at usbus1
> >       uhub1 on uhub0
> >       ...
> >    Console output is garbled by concurrent USB enumeration messages,
> >    suggesting interrupt contention.
> > 4. Commented out RTEMS_BSD_CONFIG_BSP_CONFIG and the
> >    #include <bsp/nexus-devices.h> to prevent DWC OTG initialisation
> >    -> STABLE, ran for 14+ minutes with no hang (test stopped manually)
> >
> > The libbsd software stack (loopback, sockets, telnetd) continues to
> > function in test 4 — only the hardware BSP devices (DWC OTG, muge,
> > uhub, ukphy) are excluded.
> >
> > Minimal reproduction
> > --------------------
> > Build an RTEMS 6.2 application for raspberrypi2 BSP with:
> >
> >   #define RTEMS_BSD_CONFIG_BSP_CONFIG
> >   #define RTEMS_BSD_CONFIG_INIT
> >   #include <machine/rtems-bsd-config.h>
> >   #include <bsp/nexus-devices.h>     /* in one translation unit */
> >
> >   #define CONFIGURE_MICROSECONDS_PER_TICK 10000
> >
> > The application does not need to configure any network interface or
> > perform any USB transfers — the DWC OTG hub polling alone triggers
> > the hang after ~5-6 minutes.
> >
> > Commenting out RTEMS_BSD_CONFIG_BSP_CONFIG and the nexus-devices.h
> > include eliminates the hang.
> >
> > Boot log (abbreviated, from hanging configuration)
> > --------------------------------------------------
> > RTEMS RPi 3B+ 1.3 (1GB) [00a020d3]
> > nexus0: <RTEMS Nexus device>
> > dwcotg0: <DWC OTG 2.0 integrated USB controller> on nexus0
> > usbus1 on dwcotg0
> > usbus1: 480Mbps High Speed USB v2.0
> > ugen1.1: <DWCOTG OTG Root HUB> at usbus1
> > uhub0 on usbus1
> > uhub0: <DWCOTG OTG Root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
> > uhub0: 1 port with 1 removable, self powered
> > ugen1.2: <vendor 0x0424 product 0x2514> at usbus1
> > uhub1 on uhub0
> > uhub1: <vendor 0x0424 product 0x2514, class 9/0, rev 2.00/b.b3, addr 2>
> on usbus1
> > uhub1: 4 ports with 3 removable, self powered
> > ugen1.3: <vendor 0x0424 product 0x2514> at usbus1
> > uhub2 on uhub1
> > uhub2: <vendor 0x0424 product 0x2514, class 9/0, rev 2.00/b.b3, addr 3>
> on usbus1
> > uhub2: 3 ports with 2 removable, self powered
> > ugen1.4: <vendor 0x0424 product 0x7800> at usbus1
> > muge0: <vendor 0x0424 product 0x7800, rev 2.10/3.00, addr 4> on usbus1
> > muge0: Chip ID 0x7800 rev 0002
> > miibus0: <MII bus> on muge0
> > ukphy0: <Generic IEEE 802.3u media interface> PHY 1 on miibus0
> > info: ue0: <USB Ethernet> on muge0
> > [system hangs after ~5-6 minutes, UART unresponsive]
> >
> > Questions
> > ---------
> > 1. Is this a known issue with the DWC OTG driver on RPi 3B+?
> > 2. Are there any configuration options (hub polling interval, interrupt
> >    coalescing, DMA settings) that might work around the problem?
> > 3. Would a newer version of rtems-libbsd contain fixes for this?
> > 4. Is there an alternative Ethernet driver approach for RPi 3B+ that
> >    avoids the DWC OTG USB path?
> > 5. Is there any known project which uses RPi with RTEMS?
> > (it looks like that the option to run RTEMS on RPi4 or RPi 5 can not be
> considered since the BSP in RTEMS kernel  is not yet finalised.
> > The RPi4 or RPi 5 would be much better candidates vs RPi 3 B+ since they
> use direct connection to Ethernet instead of the USB-Ethernet)
> >
> > Thank you for any guidance.
> >
> > Mirek
> >
> >
> > _______________________________________________
> > users mailing list
> > users at rtems.org
> > http://lists.rtems.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20260422/ef7b3867/attachment-0001.htm>


More information about the users mailing list