[RTEMS Project] #2622: FAT file corruption when pre-empted while appending to a file
RTEMS trac
trac at rtems.org
Thu Mar 3 23:40:37 UTC 2016
#2622: FAT file corruption when pre-empted while appending to a file
------------------------+--------------------
Reporter: smlaurenzo | Owner:
Type: defect | Status: new
Priority: normal | Milestone: 4.10.3
Component: filesystem | Version: 4.10
Severity: normal | Keywords:
------------------------+--------------------
We've been circling around some odd problems for a while where some of our
files end up with garbage sequences in them. I'll save you the hand-
wringing diagnostic steps, and jump to the conclusion: when opening and
appending to an existing file, sometimes a cluster gets written that
contains data from another concurrent write operation (to a different
file). An isolated repro is hard to get, but we wedged our code into a
state where we can repro it 100% of the time.
I traced the problem down to this sequence (introduced in commit
42a22f0824c4618b864582804ce1440b548a462f - 2012):
In fat_file_write_fat32_or_non_root_dir:
{{{
if (file_cln_initial < file_cln_cnt)
overwrite_cluster = true;
}}}
Triggers (in fat_block_write):
{{{
if ( overwrite_block
|| (bytes_to_write == fs_info->vol.bytes_per_block))
{
rc = fat_buf_access(fs_info, sec_num, FAT_OP_TYPE_GET,
&blk_buf);
}
else {
rc = fat_buf_access(fs_info, sec_num, FAT_OP_TYPE_READ,
&blk_buf);
}
}}}
I have a task that wakes up every 5s, opens the file for append, and
writes some hundreds of bytes. With a little bit of logging, we find that
each operation that does not extend past the first cluster (4KiB) takes
the FAT_OP_TYPE_READ branch. Then as soon as the first write to the second
file cluster is made (which is usually an overflow from a user-level write
that spanned the 4K boundary), all future writes take the FAT_OP_TYPE_GET
branch.
I was convinced for a while that perhaps some proximate code of ours was
corrupting some bit of accounting, but upon reading through what this is
doing, I cannot wrap my head around how the intention was correct. The "if
(file_cln_initial < file_cln_cnt)" condition could be unpacked to:
{{{
if (fat_fd->map.file_cln < (seek_disk_cln - start_disk_cln))
}}}
I don't see how this arithmetic is correct. We are comparing a file cln to
the delta between two disk clns, which unless if I am missing something,
is meaningless. Also, we are getting the file cln from the cache, the
interpretation of which depends entirely on the operation that took place
when it was queried (which is in fat_file_write).
I think the only way this makes sense is if this check were instead
passing if we are writing to the last cluster of the file at offset 0
within the cluster. At any other time, this needs to be a read-modify-
write because we can't just overwrite the cluster. I'm not sure how to
express this, though.
It turns out that for many operations without considering pre-emption, the
buffer you get back with fat_buf_access(FAT_OP_TYPE_GET) is populated with
the cluster data. When writing sequentially to a file from a single task,
this seems to hold together. However, being pre-empted by a higher
priority writer may cause some buffer churn and will result in writing a
cluster that has the beginning corrupted. We see this as periodic
corruption, the beginning of which is always aligned to a 4KiB file offset
boundary.
If we hard-code overwrite_cluster to always be false, we do not experience
corruption (assuming some performance penalty in these corner cases).
Can someone either confirm or explain what this code is (supposed to be)
doing? I'm not ruling out that we are causing a problem here, but right
now I am leaning to a defect in the filesystem.
--
Ticket URL: <http://devel.rtems.org/ticket/2622>
RTEMS Project <http://www.rtems.org/>
RTEMS Project
More information about the bugs
mailing list