[RTEMS Project] #4568: untar: problems with existing directories

RTEMS trac trac at rtems.org
Thu Dec 9 07:16:26 UTC 2021


#4568: untar: problems with existing directories
---------------------------------+--------------------------------
  Reporter:  Christian Mauderer  |      Owner:  Christian Mauderer
      Type:  defect              |     Status:  assigned
  Priority:  normal              |  Milestone:  Indefinite
 Component:  lib                 |    Version:  6
  Severity:  normal              |   Keywords:
Blocked By:                      |   Blocking:
---------------------------------+--------------------------------
 Cloned from #4552:
 ----
 Our current implementation of untar in cpukit/libmisc/untar/untar.c has
 problems if a directory in the archive already exists. Note that this is
 no problem, if the archive contains only a file.

 The problem exists on 5 and master.

 Example: If I have a tar.gz file which contains a file and directories
 l1/l2/x.txt and call Untar_FromGzChunk_Print twice, the first attempt will
 print

 {{{
 untar: dir: l1
 untar: dir: l1/l2
 untar: file: l1/l2/x.txt (s:12,m:0644)
 }}}

 After that the directories l1 already exists. So if I re-try to extract
 the archive, I'll get the following:

 {{{
 untar: dir: l1
 untar: mkdir: l1: (17) File exists
 }}}

 My expectation would have been that the files are just integrated into an
 existing directory structure. If a file exists, it should be overwritten.

 We have multiple references for expected behavior. GNU or BSD `tar` or
 POSIX `pax`. In my experience `tar` is the better known tool so my
 suggestion would be to use the default behavior of `tar` as a reference.

 === GNU or BSD `tar`

 I tested the default behavior of GNU `tar` and BSD `tar`. It seems to be
 the same for both:

 - If a directory structure exists, the files from the archive will be
 integrated. Existing files are overwritten.

 - If a file exists and the archive contains a directory with the same
 name, the file is removed and a directory is created. In the above
 example: if `l1/l2` is a file it will be overwritten with a new directory.

 - If a directory exists and the archive contains a file with the same
 name, the directory will be replaced if it is empty. If it contains files,
 the result is an error.

 - An archive also can contain only a file without the parent directories.
 If in that case one of the parent directories exists as a file extracting
 the archive results in an error. In the example: if `l1/l2` is a file and
 the archive doesn't contain the directories but only the file
 `l1/l2/x.txt` that would be an error.

 In case of an error, it is possible that the archive has been partially
 extracted.

 Note: GNU `tar` has options to change the behavior (like --recursive-
 unlink). I'm sure there are similar options in BSD `tar`. From my point of
 view we should adapt to the default behavior, so I ignored these options.

 === The POSIX `pax` utility

 https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html

 Default behavior is described as follows:

   If an attempt is made to extract a directory when the directory already
 exists, this shall not be considered an error. If an attempt is made to
 extract a FIFO when the FIFO already exists, this shall not be considered
 an error.

 From some quick tests `pax` has a similar behavior like `tar`. The only
 difference I noted is that empty directories are not overwritten with
 files from the archive.

--
Ticket URL: <http://devel.rtems.org/ticket/4568>
RTEMS Project <http://www.rtems.org/>
RTEMS Project


More information about the bugs mailing list