[RTEMS Project] #2622: FAT file corruption when pre-empted while appending to a file

Thu Mar 3 23:40:37 UTC 2016

#2622: FAT file corruption when pre-empted while appending to a file
------------------------+--------------------
 Reporter:  smlaurenzo  |      Owner:
     Type:  defect      |     Status:  new
 Priority:  normal      |  Milestone:  4.10.3
Component:  filesystem  |    Version:  4.10
 Severity:  normal      |   Keywords:
------------------------+--------------------
 We've been circling around some odd problems for a while where some of our
 files end up with garbage sequences in them. I'll save you the hand-
 wringing diagnostic steps, and jump to the conclusion: when opening and
 appending to an existing file, sometimes a cluster gets written that
 contains data from another concurrent write operation (to a different
 file). An isolated repro is hard to get, but we wedged our code into a
 state where we can repro it 100% of the time.

 I traced the problem down to this sequence (introduced in commit
 42a22f0824c4618b864582804ce1440b548a462f - 2012):

   In fat_file_write_fat32_or_non_root_dir:
 {{{
         if (file_cln_initial < file_cln_cnt)
             overwrite_cluster = true;
 }}}
   Triggers (in fat_block_write):
 {{{
         if (   overwrite_block
             || (bytes_to_write == fs_info->vol.bytes_per_block))
         {
             rc = fat_buf_access(fs_info, sec_num, FAT_OP_TYPE_GET,
 &blk_buf);
         }
         else {
             rc = fat_buf_access(fs_info, sec_num, FAT_OP_TYPE_READ,
 &blk_buf);
         }
 }}}

 I have a task that wakes up every 5s, opens the file for append, and
 writes some hundreds of bytes. With a little bit of logging, we find that
 each operation that does not extend past the first cluster (4KiB) takes
 the FAT_OP_TYPE_READ branch. Then as soon as the first write to the second
 file cluster is made (which is usually an overflow from a user-level write
 that spanned the 4K boundary), all future writes take the FAT_OP_TYPE_GET
 branch.

 I was convinced for a while that perhaps some proximate code of ours was
 corrupting some bit of accounting, but upon reading through what this is
 doing, I cannot wrap my head around how the intention was correct. The "if
 (file_cln_initial < file_cln_cnt)" condition could be unpacked to:

 {{{
   if (fat_fd->map.file_cln < (seek_disk_cln - start_disk_cln))
 }}}

 I don't see how this arithmetic is correct. We are comparing a file cln to
 the delta between two disk clns, which unless if I am missing something,
 is meaningless. Also, we are getting the file cln from the cache, the
 interpretation of which depends entirely on the operation that took place
 when it was queried (which is in fat_file_write).

 I think the only way this makes sense is if this check were instead
 passing if we are writing to the last cluster of the file at offset 0
 within the cluster. At any other time, this needs to be a read-modify-
 write because we can't just overwrite the cluster. I'm not sure how to
 express this, though.

 It turns out that for many operations without considering pre-emption, the
 buffer you get back with fat_buf_access(FAT_OP_TYPE_GET) is populated with
 the cluster data. When writing sequentially to a file from a single task,
 this seems to hold together. However, being pre-empted by a higher
 priority writer may cause some buffer churn and will result in writing a
 cluster that has the beginning corrupted. We see this as periodic
 corruption, the beginning of which is always aligned to a 4KiB file offset
 boundary.

 If we hard-code overwrite_cluster to always be false, we do not experience
 corruption (assuming some performance penalty in these corner cases).

 Can someone either confirm or explain what this code is (supposed to be)
 doing? I'm not ruling out that we are causing a problem here, but right
 now I am leaning to a defect in the filesystem.

--
Ticket URL: <http://devel.rtems.org/ticket/2622>
RTEMS Project <http://www.rtems.org/>
RTEMS Project