Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document probable bugs with tar_lwt_unix #152

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jonahbeckford
Copy link

@jonahbeckford jonahbeckford commented Sep 19, 2024

I added in some test cases because I have not been able to get ocaml-tar working after version 3. (It was working in version 2, although the API was signficantly different so difficult to compare code directly).

The first test (A) is the simplest ... it just calls Tar_lwt_unix.extract ~filter:test_filter ~src:filename "extracted/" but even that fails. See targz.t for the captured errors and a description of the other two more complicated tests. (I ran on a Windows machine, but I've seen failures on Linux + macOS).

NOTE: There will be different failures on ocaml-ci because they run in a sandbox ... I didn't want to check in a 4 MB .tar.gz so it is downloaded during the tests. So, please test on a desktop with dune build -p tar,tar-unix '@runtest' using no sandboxes.

@reynir
Copy link
Member

reynir commented Sep 20, 2024

Thanks! This is very helpful. I've identified one bug which made a wrong assumption about Tar.Really_read len:

diff --git a/unix/tar_lwt_unix.ml b/unix/tar_lwt_unix.ml
index cb7bcb5e7..5da2a8938 100644
--- a/unix/tar_lwt_unix.ml
+++ b/unix/tar_lwt_unix.ml
@@ -43,7 +43,9 @@ let safe f a =
 let read_complete fd buf len =
   let open Lwt_result.Infix in
   let rec loop offset =
+    assert (Bytes.length buf >= len);
     if offset < len then
+      let () = assert (Bytes.length buf > offset) in
       safe (Lwt_unix.read fd buf offset) (len - offset) >>= fun read ->
       if read = 0 then
         Lwt.return (Error `Unexpected_end_of_file)
@@ -96,7 +98,7 @@ let run t fd =
         Lwt_result.return (Bytes.sub_string b 0 read)
     | Tar.Really_read len ->
       let buf = Bytes.make len '\000' in
-      read_complete fd buf Tar.Header.length >|= fun () ->
+      read_complete fd buf len >|= fun () ->
       Bytes.unsafe_to_string buf
     | Tar.Seek len -> seek fd len
     | Tar.Return value -> Lwt.return value

Next I get different errors:

cram diff
File "test/targz.t", line 1, characters 0-0:
diff --git a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected
index acf0d4236..adbb17f85 100644
--- a/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t
+++ b/_build/.sandbox/460604fbbf9870a59de2e615f350c9e6/default/test/targz.t.corrected
@@ -22,14 +22,8 @@ _  in regress_targz.ml:fold works when `decompress = Fun.id`
 A. "Extract"-ing it in OCaml.
 That is, TestExtract.do_test
   $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar extract
-  Fatal error: exception Invalid_argument("Lwt_unix.read")
-  Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
-  Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18
-  Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
-  Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26
-  Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11
+  Fatal error: exception Failure("Could not find entry. Error error No such file or directory in function open extracted/openapi-1044/.github/workflows/publish.yml")
+  Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
   Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
   Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29
   Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20
@@ -43,14 +37,8 @@ That is, TestUntar.do_test ~gunzip:false where ...
 _     (fun () -> Tar_lwt_unix.run (decompress (Tar.fold f init)) fd)
 in regress_targz.ml:fold works when `decompress = Fun.id`
   $ OCAMLRUNPARAM=b ./regress_targz.exe v1044.tar untar
-  Fatal error: exception Invalid_argument("Lwt_unix.read")
-  Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
-  Called from Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 38, characters 15-18
-  Called from Lwt.Sequential_composition.catch in file "src/core/lwt.ml", line 2016, characters 10-14
-  Re-raised at Tar_lwt_unix.safe.(fun) in file "unix/tar_lwt_unix.ml", line 41, characters 13-26
-  Called from Tar_lwt_unix.read_complete.loop in file "unix/tar_lwt_unix.ml", line 47, characters 6-55
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 99, characters 6-44
-  Called from Tar_lwt_unix.run.run in file "unix/tar_lwt_unix.ml", line 105, characters 6-11
+  Fatal error: exception Failure("Could not find entry. unmarshal Int64.of_string: failed to parse int64 \"0o0erated c\"")
+  Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
   Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "src/core/lwt.ml", line 1844, characters 16-19
   Re-raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3123, characters 20-29
   Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 27, characters 10-20

* We need to always skip files - there's currently no way to exit early
* Tar_lwt_unix.extract does not attempt to create the destination
@reynir
Copy link
Member

reynir commented Sep 20, 2024

Ok, with the above fix and the fixes to the tests (you always need to either read or skip files; there's no early exit) there is one bug in extract:

  • directories in the archive are ignored and not created
  • even with the above, implicit directories (that is, files with e.g. path implicit-dir/hello.txt where implicit-dir doesn't have an entry in the archive) are not created either
  • the above means open(2) fails for the file extracted/openapi-1044/.github/workflows/publish.yml as the parent directories do not exist.

@hannesm
Copy link
Member

hannesm commented Sep 20, 2024

did the implicit creation of directories change between tar 2 and tar 3? I guess from a usability point of view ot makes sense to support it, and create these directories.

@jonahbeckford
Copy link
Author

even with the above, implicit directories (that is, files with e.g. path implicit-dir/hello.txt where implicit-dir doesn't have an entry in the archive) are not created either

did the implicit creation of directories change between tar 2 and tar 3?

By the way: I actually don't care about implicit creation for my use cases. I didn't use extract or its equivalent in tar 2 ... I only put extract into the regression test so I could have a simple-to-understand test case.


Thanks for getting this fixed so quickly!

@reynir
Copy link
Member

reynir commented Sep 20, 2024

A quick look at tar 2.6 it seems extract never created directories. I think it is unfortunate that we don't, but as such it's not a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants