[RTEMS Project] #2944: FAT data corruption during unmount()
RTEMS trac
trac at rtems.org
Tue Mar 28 08:15:10 UTC 2017
#2944: FAT data corruption during unmount()
-----------------------------+-----------------------
Reporter: Sebastian Huber | Owner: chrisj@…
Type: defect | Status: new
Priority: normal | Milestone: 4.12
Component: filesystem | Version: 4.11
Severity: normal | Resolution:
Keywords: |
-----------------------------+-----------------------
Comment (by slemstick):
Replying to [comment:1 Chris Johns]:
> Replying to [ticket:2944 Sebastian Huber]:
> > Removing the function call in msdos_shut_down ( .. ) to close the root
file descriptor solves the problem perfectly (clean fsck).
>
> I assume you mean fat_file_close?
Yes.
>
> > However, we're a bit unsure about the intent behind closing the root
directory.
>
> There is memory allocated in fat_file_open which you would leak.
We fixed this issue by creating a special "root file close" function, by
removing the call to fat_file_update() in fat_file_close() (which caused
the corruption).
>
> I see the fat_file_close calls fat_buf_release and if the fs_info cache
is not empty it is holding a bdbuf buffer so this would cause a leak of
buffers.
>
> What about the fat_file_close calls in the msdos init call on error?
Would they also cause the same problem?
Yes, these will cause the same issues.
To update / summarise this ticket a bit here:
We originally attempted a fix to this problem by setting the hard-coded
root directory cluster number to 2, as well as the above (remove
corruption caused by fat_file_update() in fat_file_close() on unmount).
However, our attempt to fix the broken root cluster numbering breaks a
hashing mechanism in fat_file_open(..). This mechanism indexes open file
descriptors based on 1) parent directory cluster number and 2) offset into
that directory structure. The issue is that the root directory, and the
file pointed to by the first directory entry in the root directory, may
construct their hashes based on the same indexes:
> Root directory: cluster number 2, offset 0
> First file in root directory: cluster number 2, offset 0
Before, this was not a problem of course, as the root directory had the
hard-coded cluster number of 1, and the keys were therefore always unique.
But this can actually cause a number of new issues.
The fix to this problem is to set the hard-coded root cluster directory
number back to 1, instead of drastically changing the key hashing method
function calls and data structures, and trusting that removing calls to
fat_file_update(on_root_node) are sufficient to avoid the data corruption
issue described above.
However, there are two other places in msdos_misc.c where the hardcoded
root directory cluster number - FAT_ROOTDIR_CLUSTER_NUM - is used:
> msdos_get_name_node()
> msdos_get_dotdot_dir_info_cluster_num()
Like this:
if ( (MSDOS_EXTRACT_CLUSTER_NUM(dotdot_node)) == 0)
{
/*
* we handle root dir for all FAT types in the same way with the
* ordinary directories ( through fat_file_* calls )
*/
fat_dir_pos_init(dir_pos);
dir_pos->sname.cln = FAT_ROOTDIR_CLUSTER_NUM;
}
Which, to my understanding, will never occur as you should never have a
cluster number below 2 in a compliant (msdos) FAT file system. Does anyone
know the intent behind this?
--
Ticket URL: <http://devel.rtems.org/ticket/2944#comment:2>
RTEMS Project <http://www.rtems.org/>
RTEMS Project
More information about the bugs
mailing list