RTEMS | Deterministic Hedged Read Library (DHRL) for DRAM Tail Latency Mitigation (#5548)

Wayne Thornton (@wmthornton-dev) gitlab at rtems.org
Wed Apr 8 17:38:34 UTC 2026



Wayne Thornton created an issue: https://gitlab.rtems.org/rtems/rtos/rtems/-/issues/5548

Assignee: Wayne Thornton

## Summary
External Dynamic RAM (DRAM) acts as a hidden source of non-determinism. DRAM cells require periodic, hardware-mandated refresh cycles ($tREFI$) to prevent data loss. During these refresh cycles, the memory controller locks the bank.

If a high-priority RTEMS control thread (such as flight software hazard avoidance) suffers an L1/L2 cache miss, the CPU must fetch from main memory. If that fetch collides with a $tREFI$ cycle or a row-buffer conflict, the thread is effectively stalled by the hardware. A memory read that normally takes 40ns can spike to over 300ns. While acceptable in standard OS environments, this "tail latency" shatters Worst-Case Execution Time (WCET) bounds in RTEMS. Developers must grossly over-pad their execution deadlines to account for worst-case hardware refresh alignments, wasting CPU cycles.

The Deterministic Hedged Read Library (DHRL) solves hardware latency spikes by trading available memory bus bandwidth and SMP parallel compute for strict time determinism. It leverages the statistical reality that two physically independent memory controllers will not execute refresh cycles simultaneously. 

##Execution Flow: 

Redundant Mapping: Critical read-only payloads are duplicated across two independent physical memory channels (for example, Bank A and Bank B) 

SMP Thread Pinning: DHRL spawns two worker tasks and uses $rtems_task_set_affinity()$ to rigidly pin them to distinct physical CPU cores.

The Hedged Read ("Race" Condition): When the main application requires data, it fires $rtems_event_send()$ to wake both pinned workers. Both cores simultaneously force an AXI bus read to their respective memory controllers.

Lock-Free Resolution: Whichever memory controller is not currently refreshing returns the data first. That "winning" thread executes a C11 $atomic_compare_exchange_strong$ to claim a shared flag, which instantly wakes the main thread to hand over the pointer. The slower read is safely dropped.

##Trade-offs & Hardware Requirements

Cost: DHRL intentionally burns instantaneous memory bus bandwidth and requires dedicating parallel CPU cores to execute a single read operation.

Hardware Dependency: This software-level fix requires a target SoC with an RTEMS SMP BSP and at least two physically independent memory controllers.It provides zero benefit on single-channel architectures.

Benefit: Absolute bounds on memory fetch latency and "free" spatial fault tolerance against Single Event Functional Interrupts (SEFIs) on the memory controllers.

##Acceptance Criteria
[ ] $dhrl_init()$ correctly spawns and pins worker threads using the RTEMS Classic API.

[ ] $dhrl_fetch_data()$ correctly wakes workers, executes the race, and returns the valid pointer via C11 Atomics and RTEMS Event Sets without deadlocking.

[ ] API successfully handles arbitrary memory pointers (volatile void*).

[ ] Validation: Empirical tests on multi-channel hardware (or cycle-accurate simulators) demonstrate a mathematically bounded WCET for DRAM fetches, eliminating the tail latency distribution curve.

[ ] Library compiles cleanly via $waf$ and integrates into the $cpukit$ build structure gated by compiler flags.


<!-- Pre-set options
- milestone
-->

-- 
View it on GitLab: https://gitlab.rtems.org/rtems/rtos/rtems/-/issues/5548
You're receiving this email because of your account on gitlab.rtems.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/bugs/attachments/20260408/18ddccd4/attachment-0001.htm>


More information about the bugs mailing list