File system deadlock troubleshooting
Chris Johns
chrisj at rtems.org
Tue Oct 8 04:34:00 UTC 2019
On 8/10/19 12:53 pm, Mathew Benson wrote:
> I'm using RTEMS 5 on a LEON3. I'm troubleshooting a failure condition that
> occurs when stress test reading and writing to and from RAM disk. RAM disk to
> RAM disk. When the condition is tripped, it appears that I have 4 tasks that
> are pending on conditions that just never happens.
Do you have a test case?
> The task command shows:
>
> ID NAME SHED PRI STATE MODES EVENTS WAITINFO
> ------------------------------------------------------------------------------
> 0a01000c TSKA UPD 135 MTX P:T:nA NONE RFS
> 0a01001f TSKB UPD 135 CV P:T:nA NONE bdbuf access
> 0a010020 TSKC UPD 150 MTX P:T:nA NONE RFS
> 0a010032 TSKD UPD 245 MTX P:T:nA NONE RFS
It looks like TSKA, TSKC and TSKD are waiting for the RFS lock and TSKB is
blocked in a bdbuf access. I wonder why that is blocked?
The RFS hold's it lock over the bdbuf calls.
>
> None of my tasks appear to be failed. Nobody is pending on anything noticeable
> except the 4 above. The conditional wait is a single shared resource so any
> attempt to access the file system after this happens results in yet another
> forever pended task.
>
> Digging into source code, it appears that the kernel is waiting for a specific
> response from a block device, but just didn't get what its expecting. The next
> thing is to determine which block device the kernel is pending on, what the
> expected response is, and what the block device actually did. Can anybody shed
> some light on this or recommend some debugging steps? I'm trying to exhaust
> all I can do before I start manually decoding machine code.
The RFS has trace support you can access via `rtems/rfs/rtems-rfs-trace.h`. You
can set the trace mask in your code or you can can call
`rtems_rfs_trace_shell_command()` with suitable arguments or hook it to an
existing shell. There is a buffer trace flag that show the release calls to bdbuf ..
RTEMS_RFS_TRACE_BUFFER_RELEASE
There is no trace call to get or read. Maybe add a get/read trace as well.
The RAM disk also has trace in the code which can be enabled by editing the file.
Chris
More information about the users
mailing list