File system deadlock troubleshooting

Chris Johns chrisj at rtems.org
Tue Oct 8 04:34:00 UTC 2019


On 8/10/19 12:53 pm, Mathew Benson wrote:
> I'm using RTEMS 5 on a LEON3.  I'm troubleshooting a failure condition that
> occurs when stress test reading and writing to and from RAM disk.  RAM disk to
> RAM disk.  When the condition is tripped, it appears that I have 4 tasks that
> are pending on conditions that just never happens.  

Do you have a test case?

> The task command shows:
> 
> ID       NAME                 SHED PRI STATE  MODES    EVENTS WAITINFO
> ------------------------------------------------------------------------------
> 0a01000c TSKA                  UPD  135 MTX    P:T:nA   NONE   RFS
> 0a01001f TSKB                   UPD  135 CV     P:T:nA   NONE   bdbuf access
> 0a010020 TSKC                   UPD  150 MTX    P:T:nA   NONE   RFS
> 0a010032 TSKD                 UPD  245 MTX    P:T:nA   NONE   RFS

It looks like TSKA, TSKC and TSKD are waiting for the RFS lock and TSKB is
blocked in a bdbuf access. I wonder why that is blocked?

The RFS hold's it lock over the bdbuf calls.
> 
> None of my tasks appear to be failed.  Nobody is pending on anything noticeable
> except the 4 above.  The conditional wait is a single shared resource so any
> attempt to access the file system after this happens results in yet another
> forever pended task.
> 
> Digging into source code, it appears that the kernel is waiting for a specific
> response from a block device, but just didn't get what its expecting.  The next
> thing is to determine which block device the kernel is pending on, what the
> expected response is, and what the block device actually did.  Can anybody shed
> some light on this or recommend some debugging steps?   I'm trying to exhaust
> all I can do before I start manually decoding machine code.

The RFS has trace support you can access via `rtems/rfs/rtems-rfs-trace.h`. You
can set the trace mask in your code or you can can call
`rtems_rfs_trace_shell_command()` with suitable arguments or hook it to an
existing shell. There is a buffer trace flag that show the release calls to bdbuf ..

 RTEMS_RFS_TRACE_BUFFER_RELEASE

There is no trace call to get or read. Maybe add a get/read trace as well.

The RAM disk also has trace in the code which can be enabled by editing the file.

Chris


More information about the users mailing list