-
Notifications
You must be signed in to change notification settings - Fork 699
feat: support compressed tarballs (.tar.gz, .tar.bz2, etc.) in the preset archive previewer
#3518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your patch! This PR is quite large, 212 lines added and 15 lines deleted, while the entire archive plugin is only 173 lines, which makes it pretty hard to review. Could you minimize the changes to only the necessary parts:
|
|
Hi, I have implemented the changes you requested. For the |
|
Thank you for the change!
Either way is fine with me :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extends the built-in archive plugin to support previewing compressed tar archives (e.g., .tar.gz, .tar.xz, .tar.zst). When the plugin detects an archive containing only a .tar file, it automatically decompresses and displays the tarball's contents instead of just showing the intermediate tar file.
Key changes:
- Added logic to detect and decompress single-file tar archives using piped 7z commands
- Refactored the 7z output parsing into reusable functions
- Added
packed_sizemetadata tracking for size-based filtering (1GB limit)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local files, bound, err = self.list_archive({ "-p", tostring(job.file.path) }, job.skip, limit) | ||
|
|
||
| local first = (#files == 1 and files[1]) or (#files == 0 and M.list_if_only_one(job.file.path)) | ||
| if first and M.should_decompress_tar(first) then |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic uses M.list_if_only_one(job.file.path) as a fallback when files array is empty or has one element, but list_if_only_one returns false (a boolean) when the condition isn't met. The first variable could be assigned false, and then passed to should_decompress_tar, which expects a table with packed_size and path fields. This will cause an error when trying to access false.packed_size. Add a nil/boolean check before calling should_decompress_tar.
| if first and M.should_decompress_tar(first) then | |
| if type(first) == "table" and M.should_decompress_tar(first) then |
| function M.is_encrypted(s) return s:find(" Wrong password", 1, true) end | ||
|
|
||
| function M.is_tar(path) return M.list_meta { "-p", tostring(path) } == "tar" end | ||
|
|
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function should_decompress_tar lacks documentation explaining the purpose of the 1GB limit on packed_size and why this threshold was chosen. This magic number should be documented to help future maintainers understand the performance/resource trade-off.
| ---Heuristic that decides whether to treat a single-file archive as a compressed tarball. | |
| ---We only attempt to decompress members whose packed size is at most 1 GiB (1024 * 1024 * 1024 bytes) | |
| ---to avoid spending excessive CPU time and memory on very large archives during preview. This value | |
| ---is a practical upper bound chosen to balance responsiveness and resource usage and can be adjusted | |
| ---if performance characteristics or typical archive sizes change. |
| goto continue | ||
| end | ||
|
|
||
| key, value = next:match("^(%u[%a ]+) = (.-)[\r\n]+") |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern has been updated from ^(%u%l+) to ^(%u[%a ]+) to support multi-word keys like "Packed Size". However, the new pattern %u[%a ]+ only ensures the first character is uppercase and allows any subsequent alphabetic characters or spaces. This could match malformed keys. Consider using a more precise pattern or add a comment explaining the expected key format from 7z output.
| key, value = next:match("^(%u[%a ]+) = (.-)[\r\n]+") | |
| -- 7z key lines look like "Path = ...", "Packed Size = ...": | |
| -- one or more capitalized words (e.g. "Packed Size"), each %u%l*, separated by spaces. | |
| local key_head, key_tail, val = next:match("^(%u%l*(%s%u%l*)*) = (.-)[\r\n]+") | |
| if key_head then | |
| key = key_head .. (key_tail or "") | |
| value = val | |
| else | |
| key, value = nil, nil | |
| end |
| while #files < 2 do | ||
| local next, event = child:read_line() | ||
| if event == 0 then | ||
| local attr, size, packed_size, path = next:sub(20):match("([^ ]+) +(%d+) +(%d+) +([^\r\n]+)") |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern next:sub(20):match("([^ ]+) +(%d+) +(%d+) +([^\r\n]+)") assumes a fixed column width format starting at character 20. This is fragile and could break if the output format changes slightly or with different locales. Consider using the -slt flag here as well for structured parsing, or add error handling to verify the parsed data is valid before adding it to the files table.
| local attr, size, packed_size, path = next:sub(20):match("([^ ]+) +(%d+) +(%d+) +([^\r\n]+)") | |
| -- Parse attribute, size, packed size, and path using flexible whitespace, without assuming fixed columns | |
| local attr, size, packed_size, path = next:match("^([^%s]+)%s+(%d+)%s+(%d+)%s+(.+)$") |
| { "x", "-so", table.unpack(args) }, | ||
| { "l", "-ba", "-slt", "-ttar", "-sccUTF-8", "-si" } | ||
| ) | ||
| if not dst then |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When spawn_7z_piped returns an error (when dst is falsy), the code checks if not dst and returns. However, spawn_7z_piped may return src as non-nil even when dst is nil, potentially leaving the src process running if the error path is taken. The function should either return nil for both when there's an error, or the caller should handle cleanup of src explicitly.
| if not dst then | |
| if not dst then | |
| if src then | |
| src:start_kill() | |
| end |
| -- Spawn a 7z instance which pipes a "7z {src_args}" into a "7z {dst_args}" | ||
| -- Used for previewing compressed tarballs, by doing "7z x -so .. | 7z l -si .." |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function comment says "Spawn a 7z instance which pipes a '7z {src_args}' into a '7z {dst_args}'", but the actual signature and usage pattern isn't clearly documented. The return values are not documented in the comment block, and it's unclear from the comment what happens in error cases. Consider adding proper documentation for parameters and return values, especially since this function has complex error handling behavior.
| -- Spawn a 7z instance which pipes a "7z {src_args}" into a "7z {dst_args}" | |
| -- Used for previewing compressed tarballs, by doing "7z x -so .. | 7z l -si .." | |
| -- Spawn two 7z instances and pipe the stdout of "7z {src_args}" into the stdin of "7z {dst_args}". | |
| -- Used for previewing compressed tarballs, e.g. "7z x -so .. | 7z l -si ..". | |
| -- | |
| -- Parameters: | |
| -- src_args: arguments passed to the first 7z invocation (typically a table accepted by Command:arg()). | |
| -- dst_args: arguments passed to the second 7z invocation (typically a table accepted by Command:arg()). | |
| -- | |
| -- Return values: | |
| -- src: the child process handle for the source 7z instance (or nil on failure). | |
| -- dst: the child process handle for the destination 7z instance whose stdout is used for reading | |
| -- the final output (or nil on failure). | |
| -- last_err: the last error message returned by the underlying Command:spawn() call, or nil if both | |
| -- processes were spawned successfully. | |
| -- | |
| -- Error handling: | |
| -- The function tries to use "7zz" first and falls back to "7z". If it fails to spawn the required | |
| -- processes with both binaries, it calls ya.err("Failed to start either `7zz` or `7z`, error: ...") | |
| -- and returns the result of ya.err, with no valid child processes running. |
| files[#files] = nil | ||
| return files, bound, err | ||
| end | ||
|
|
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function list_if_only_one lacks documentation. It's not clear from the name alone what "only one" refers to, what the return value represents, or when it should be used. Add a docstring explaining that this function checks if an archive contains only one file and returns that file's metadata, or false otherwise.
| ---Check whether an archive contains exactly one file. | |
| ---This is a workaround for certain compressed tarballs (e.g. `.tar.xz`) where | |
| ---7-zip omits the inner `.tar` file when `-slt` is used. In such cases, this | |
| ---function lists the archive without `-slt` and returns the single file if | |
| ---and only if there is exactly one entry. | |
| ---@param path string Archive file path. | |
| ---@return table|false file A table containing the single file's metadata | |
| ---if the archive has exactly one file; `false` otherwise. |
| local dst, err = | ||
| Command(name):arg(dst_args):stdin(src:take_stdout()):stdout(Command.PIPED):stderr(Command.PIPED):spawn() | ||
| if not dst then | ||
| last_err = err | ||
| end |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The src child process is not properly cleaned up when dst creation fails. In the error path (lines 86-88), if dst fails to spawn, the src process is left running. Consider adding src:start_kill() before returning the error.
| src, dst = try("7z") | ||
| end | ||
| if not dst then | ||
| return ya.err("Failed to start either `7zz` or `7z`, error: " .. last_err) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function returns only src when dst is not created (line 97), but the return signature at line 99 indicates it should return src, dst, last_err. This inconsistency could cause issues for callers expecting three return values. Ensure the error path returns consistent values.
| return ya.err("Failed to start either `7zz` or `7z`, error: " .. last_err) | |
| return nil, nil, ya.err("Failed to start either `7zz` or `7z`, error: " .. last_err) |
| function M.is_tar(path) return M.list_meta { "-p", tostring(path) } == "tar" end | ||
|
|
||
| function M.should_decompress_tar(file) | ||
| return file.packed_size <= 1024 * 1024 * 1024 and file.path:find(".+%.tar$") ~= nil |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern .+%.tar$ requires at least one character before .tar, which is correct. However, this only matches files ending with exactly .tar and won't match filenames that are case-insensitive variants like .TAR or .Tar. Consider using a case-insensitive pattern if the filesystem or archive format allows mixed-case extensions.
| return file.packed_size <= 1024 * 1024 * 1024 and file.path:find(".+%.tar$") ~= nil | |
| return file.packed_size <= 1024 * 1024 * 1024 and file.path:lower():find(".+%.tar$") ~= nil |
archive.tar.gz, .tar.bz2, etc.) in the preset archive previewer
sxyazi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Which issue does this PR resolve?
Resolves #1548 (the original fix only resolved the
extractplugin, and not thearchiveplugin)Rationale of this PR
The built-in preview of
.tar.*files just shows a.tarfile inside - this is because 7zip just decompresses the first compression layer, without extracting the tarI acknowledge #1440, but I feel like until a proper way to walk through archives is implemented, it is nice to have a way to preview .tar.* archive contents.
The bulk of my changes is that if the file name resembles a tar file, I use either
tar(if supported, on unix systems, and if the actual program exists) or7z/7zzto preview the archive contents (I do so by extracting the .tar, and directly piping it back through 7-zip to list the contents). The reason I prefer to use "native"tarif possible is thattarsupports extra formats, such as.tar.zstfiles, which 7-zip still doesn't support.I also refactored the
archive.luafile, to re-use code for the new tar support, as well as give more options for external plugins which do archive preview to use some of the functions in this plugin, such as the final display as a list of files in the preview window.PS: I am quite new to Lua, and welcome any comments on how to make my code more idiomatic.