[RTEMS Project] #2792: possible RFS deadlock
RTEMS trac
trac at rtems.org
Tue Sep 27 16:29:20 UTC 2016
#2792: possible RFS deadlock
------------------------+--------------------------
Reporter: raduma | Owner:
Type: defect | Status: new
Priority: normal | Milestone: 4.11.1
Component: filesystem | Version: 4.10
Severity: normal | Keywords: rfs deadlock
------------------------+--------------------------
Hello,
I'm hitting a deadlock in our RTEMS 4.10 based system, in what seems to be
a lock inversion issue with the RFS file system driver. Or at least so it
seems from my initial investigation, coming here for clarification.
Scenario is as follows:
We have two tasks in our system doing file i/o operations, to a single RFS
partition. One task is creating, writing and closing a new file once a
second or so. The other is doing constant looping scan of the entire file
system and stat'ing every file encountered. (This is a somewhat contrived
scenario, but this scenario allowed us to get to a constantly reproducible
issue from a random occurrence).
Symptoms are as follows:
- Within a very short time, the system will dead lock in the file i/o
layer.
- The scanning task is locked in a callpath starting at
{{{rtems_filesystem_eval_path_start}}} ending in
{{{rtems_bdbuf_anonymous_wait}}} semaphore-wait on access_waiters.
- The low freq file creation task is locked a callpath starting at
{{{close(fd)}}} through {{{rtems_rfs_rtems_file_close}}} waiting to obtain
the {{{rtems_rfs_rtems_lock}}} semaphore.
- the bdbuff swapout task is just at the top waiting for the event to wake
it up and do something
Digging and tracing through the code what seems to be happening is:
- the file creator task creates files and starts writing to them,
acquiring access locks on the bdbuf buffers.
- the enumeration tasks scans and starts seeing files created, and wants
to stat them
- as part of stat, the filesystem lock handler is called, acquires the RFS
lock
- then as part of stat, it wants to load the RFS inode, tries to acquire
the block from bdbuf for read
- the block might be currently the one locked by the other task, so it
waits.
- the other task gets to close()'ing the file it has open, which *WOULD*
release its lock on the buffers, and wake up the swapout task which would
them flush and release all the other waiter tasks
- *BUT* the rfs close implementation never gets that far because it's
trying to acquire the global RFS lock which is held by the other task.
So, wondering if my understanding of what might be going on sounds
legitimate. If so, if there are any mitigation strategies we could employ
work around it (aside from don't do file i/o from multiple tasks).
--
Ticket URL: <http://devel.rtems.org/ticket/2792>
RTEMS Project <http://www.rtems.org/>
RTEMS Project
More information about the bugs
mailing list