Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting tarfile format #83

Open
legoktm opened this issue Aug 8, 2020 · 6 comments
Open

Allow setting tarfile format #83

legoktm opened this issue Aug 8, 2020 · 6 comments

Comments

@legoktm
Copy link
Contributor

legoktm commented Aug 8, 2020

For MediaWiki, we would like to be able to use a different tarfile format instead of the default one after identifying regressions in the new 3.8 default format.

We use this as a library, so if it would be possible to add an option to GitArchiver.create to let us specify tarfile.GNU_FORMAT that would be appreciated.

Our downstream ticket is https://phabricator.wikimedia.org/T257102.

@Kentzo
Copy link
Owner

Kentzo commented Aug 10, 2020

Perhaps you can consider conversation after the archive is created, as explained on StackOverflow?

@legoktm
Copy link
Contributor Author

legoktm commented Aug 11, 2020

Thanks for the suggestion, I looked into that but bsdtar doesn't support --format=gnu (or at least the Fedora packaged version doesn't), and GNU tar doesn't support the easy conversion method that bsdtar does.

But while we could do it manually, it just seems pretty inefficient to create a tarball in a brokenish format, then uncompress it and recompress it in the correct format when we could just create it in the correct format to begin with.

@Kentzo
Copy link
Owner

Kentzo commented Aug 11, 2020

There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.

What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.

@legoktm
Copy link
Contributor Author

legoktm commented Aug 17, 2020

There are myriad of options if you think about it, and not just for tar but for the compressors too. And then there are also their flavors.

True. I haven't fully thought this through, but what about allowing some arbitrary options dict to be passed to ZipFile or Tarfile as kwargs? That would allow us to pass through format without needing you to create a parameter for every single potential option and gives us flexibility in the future too.

What if I extend the archiver to produce an mtree-formatted file, will it suffice? Or perhaps in some other intermediate format which you can easily work with using builtin tools and trivial shel pipelines.

The really nice part about using this library is that it just takes care of everything for us, with very little complexity on our side :) But if that's what you think is best, we'll update our script to make it work.

wmfgerrit pushed a commit to wikimedia/mediawiki-tools-release that referenced this issue Aug 21, 2020
Python 3.8 switched the default tarfile format from GNU to PAX, but it
turns out that our PAX tarballs can't be uncompressed properly on Windows
when using 7zip.

git-archive-all doesn't let us specify a specific tarfile format yet (see
<Kentzo/git-archive-all#83>), so in the meantime
we can simply monkeypatch tarfile's default format to use GNU.

Bug: T257102
Change-Id: I2c12fe230d6d35e18cf8bc795a174663a5139911
@legoktm
Copy link
Contributor Author

legoktm commented Aug 24, 2020

As an update, we're now monkey patching tarfile.DEFAULT_FORMAT so this isn't a priority for us, but would be nice to have a less hacky way if possible.

@Kentzo
Copy link
Owner

Kentzo commented May 21, 2021

Since you are using it as a library, you should be able to call GitArchiver .archive_all_files directly passing custom callable to archive files in a way that suits you best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants