Document probable bugs with tar_lwt_unix #152

jonahbeckford · 2024-09-19T16:18:49Z

I added in some test cases because I have not been able to get ocaml-tar working after version 3. (It was working in version 2, although the API was signficantly different so difficult to compare code directly).

The first test (A) is the simplest ... it just calls Tar_lwt_unix.extract ~filter:test_filter ~src:filename "extracted/" but even that fails. See targz.t for the captured errors and a description of the other two more complicated tests. (I ran on a Windows machine, but I've seen failures on Linux + macOS).

NOTE: There will be different failures on ocaml-ci because they run in a sandbox ... I didn't want to check in a 4 MB .tar.gz so it is downloaded during the tests. So, please test on a desktop with dune build -p tar,tar-unix '@runtest' using no sandboxes.

reynir · 2024-09-20T09:08:02Z

Thanks! This is very helpful. I've identified one bug which made a wrong assumption about Tar.Really_read len:

diff --git a/unix/tar_lwt_unix.ml b/unix/tar_lwt_unix.ml
index cb7bcb5e7..5da2a8938 100644
--- a/unix/tar_lwt_unix.ml
+++ b/unix/tar_lwt_unix.ml
@@ -43,7 +43,9 @@ let safe f a =
 let read_complete fd buf len =
   let open Lwt_result.Infix in
   let rec loop offset =
+    assert (Bytes.length buf >= len);
     if offset < len then
+      let () = assert (Bytes.length buf > offset) in
       safe (Lwt_unix.read fd buf offset) (len - offset) >>= fun read ->
       if read = 0 then
         Lwt.return (Error `Unexpected_end_of_file)
@@ -96,7 +98,7 @@ let run t fd =
         Lwt_result.return (Bytes.sub_string b 0 read)
     | Tar.Really_read len ->
       let buf = Bytes.make len '\000' in
-      read_complete fd buf Tar.Header.length >|= fun () ->
+      read_complete fd buf len >|= fun () ->
       Bytes.unsafe_to_string buf
     | Tar.Seek len -> seek fd len
     | Tar.Return value -> Lwt.return value

Next I get different errors:

cram diff

File "test/targz.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected
index acf0d4236..adbb17f85 100644
--- a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t
+++ b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected
@@ -22,14 +22,8 @@ _  in regress_targz.ml:fold works when `decompress = Fun.id`
 A. "Extract"-ing it in OCaml.
 That is, TestExtract.do_test
   $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar extract
-  Fatal error: exception Invalid_argument("Lwt_unix.read")
-  Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
-  Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18
-  Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
-  Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26
-  Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11
+  Fatal error: exception Failure("Could not find entry. Error error No such file or directory in function open extracted/openapi-1044/.github/workflows/publish.yml")
+  Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
   Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
   Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29
   Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20
@@ -43,14 +37,8 @@ That is, TestUntar.do_test ~gunzip:false where ...
 _     (fun () -> Tar_lwt_unix.run (decompress (Tar.fold f init)) fd)
 in regress_targz.ml:fold works when `decompress = Fun.id`
   $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar untar
-  Fatal error: exception Invalid_argument("Lwt_unix.read")
-  Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
-  Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18
-  Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
-  Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26
-  Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11
+  Fatal error: exception Failure("Could not find entry. unmarshal Int64.of_string: failed to parse int64 \"0o0erated c\"")
+  Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
   Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
   Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29
   Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20

* We need to always skip files - there's currently no way to exit early * Tar_lwt_unix.extract does not attempt to create the destination

reynir · 2024-09-20T10:09:46Z

Ok, with the above fix and the fixes to the tests (you always need to either read or skip files; there's no early exit) there is one bug in extract:

directories in the archive are ignored and not created
even with the above, implicit directories (that is, files with e.g. path implicit-dir/hello.txt where implicit-dir doesn't have an entry in the archive) are not created either
the above means open(2) fails for the file extracted/openapi-1044/.github/workflows/publish.yml as the parent directories do not exist.

hannesm · 2024-09-20T10:27:44Z

did the implicit creation of directories change between tar 2 and tar 3? I guess from a usability point of view ot makes sense to support it, and create these directories.

jonahbeckford · 2024-09-20T10:36:37Z

even with the above, implicit directories (that is, files with e.g. path implicit-dir/hello.txt where implicit-dir doesn't have an entry in the archive) are not created either

did the implicit creation of directories change between tar 2 and tar 3?

By the way: I actually don't care about implicit creation for my use cases. I didn't use extract or its equivalent in tar 2 ... I only put extract into the regression test so I could have a simple-to-understand test case.

Thanks for getting this fixed so quickly!

reynir · 2024-09-20T10:55:19Z

A quick look at tar 2.6 it seems extract never created directories. I think it is unfortunate that we don't, but as such it's not a regression.

Document probable bugs with tar_lwt_unix

d0afde3

Fix bugs in test/regress_targz.ml

db6c507

* We need to always skip files - there's currently no way to exit early * Tar_lwt_unix.extract does not attempt to create the destination

reynir mentioned this pull request Sep 20, 2024

Fix wrong assumption in Tar_lwt_unix.run #153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document probable bugs with tar_lwt_unix #152

Document probable bugs with tar_lwt_unix #152

jonahbeckford commented Sep 19, 2024 •

edited

Loading

reynir commented Sep 20, 2024

reynir commented Sep 20, 2024

hannesm commented Sep 20, 2024

jonahbeckford commented Sep 20, 2024

reynir commented Sep 20, 2024

Document probable bugs with tar_lwt_unix #152

Are you sure you want to change the base?

Document probable bugs with tar_lwt_unix #152

Conversation

jonahbeckford commented Sep 19, 2024 • edited Loading

reynir commented Sep 20, 2024

reynir commented Sep 20, 2024

hannesm commented Sep 20, 2024

jonahbeckford commented Sep 20, 2024

reynir commented Sep 20, 2024

jonahbeckford commented Sep 19, 2024 •

edited

Loading