Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO::Uncompress::Unzip stream can't be read by Storable (PerlIO) - Demo showing how to fix #33

Open
tlhackque opened this issue Oct 26, 2021 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@tlhackque
Copy link

I have a small reproducer demonstrating that IO::Uncompress::Unzip's file descriptor won't work with Storable. Presumably there are other cases...

If I had to guess, it would be that the FH, which is passed to Storable's XS code, may not be positioned (for XS) correctly after header detection...Clearly, it works for ordinary Perl consumers.

Here is the sequence:

  • Write a small hash to a file with Storable.
  • Using system zip, create a zip file containing the Storable file.
  • Read the Storable file from \*STDIN & verify that it's what we wrote. (passes)
  • Read the Storable file from *STDIN with IUU, passing the file handle to Storable (fails)
  • Read the zip archive of the Storable file from *STDIN with IUU, passing file handle to Storable (fails)
  • Delete the file, extract it from the Zip archive, and read it from *STDIN with Storable (passes

From this, we can see that the file is written correctly by Storable, and can be read after a trip thru the Zip archive. Further, Storable is happy reading from a file handle.

BUT, neither the raw file nor the Zip member can be read correctly by Storable if IUU is in the middle.

IO::Uncompress::Unzip 2.102

In real life, the work-around is to write all 300MB of uncompressed data to a tempfile, and pass that handle to Storable. Which is suboptimal...

Here is the test program:

#!/usr/bin/perl

use warnings;
use strict;
use Storable( qw/store fd_retrieve/ );
use IO::Uncompress::Unzip;

exit unless( @ARGV);
my $cmd = shift;

if( $cmd eq "write" ) {
    my $file = shift or die( "no file\n" );
    my $hash = { key => 'value' };

    store( $hash, $file );
    system( "zip $file.zip $file" );
    exit;
}
if( $cmd eq "storable" ) {
    binmode( \*STDIN );
    my $read = fd_retrieve( \*STDIN );
    if( !$read || !exists $read->{key} ){
        print "Failed\n";
    } else {
        print "OK\n";
    }
    exit;
}
if( $cmd eq 'uncompress' ) {
    my $in = IO::Uncompress::Unzip->new( \*STDIN, { } ) or die( "IUU\n" );
    binmode( $in );
    my $read = fd_retrieve( $in );
    if( !$read || !exists $read->{key} ){
        print "Failed\n";
    } else {
        print "OK\n";
    }
    exit;
}
if( $cmd eq 'extract' ) {
    my $file = shift or die( "no file" );
    system( "rm $file; unzip $file.zip;" );
    system( "$0 storable <$file" );
    exit;
}
die( "Bad command $cmd\n" );
exit;

And here are the commands (Yes, it fails with newer versions of Perl, and yes in real life there's error checking):

# ./IUZ-Storable.pl write foo  # Write hash to file "foo" and zip
  adding: foo (stored 0%)
# ./IUZ-Storable.pl storable <foo  # Read with Storable and verify
OK
# ./IUZ-Storable.pl uncompress <foo # Read foo (transparent mode) via IUU fails
Magic number checking on storable string failed at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Storable.pm line 443, at ./IUZ-Storable.pl line 38.
# ./IUZ-Storable.pl uncompress <foo.zip # Read Zip archive via IUU fails
Magic number checking on storable string failed at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Storable.pm line 443, at ./IUZ-Storable.pl line 38.
# ./IUZ-Storable.pl extract foo   # Extract from zip file and read/verify with Storable
Archive:  foo.zip
 extracting: foo
OK
#

FWIW, strace doesn't show anything odd on the system side of the pipe.

strace -f ./IUZ-Storable.pl uncompress <foo
(...)
read(0, "pst0\4\t\0041234\4\4\4\10\3\1\0\0\0\n\5value\3\0\0\0k"..., 4096) = 34
write(2, "Magic number checking on storabl"..., 148Magic number checking on storable string failed at /usr/lib/perl5/5.8.8/i386-linu (...)
# And here's the entire Storable file (34 bytes);
## od -tao1 foo
0000000   p   s   t   0 eot  ht eot   1   2   3   4 eot eot eot  bs etx
        160 163 164 060 004 011 004 061 062 063 064 004 004 004 010 003
0000020 soh nul nul nul  nl enq   v   a   l   u   e etx nul nul nul   k
        001 000 000 000 012 005 166 141 154 165 145 003 000 000 000 153
0000040   e   y
        145 171
0000042
@tlhackque
Copy link
Author

Also, FWIW: I did some code reading in Storable that may be helpful. Storable.pm calles the C code pretrieve, which calls do_retrieve, where, after some initialization, a call to magic_check is made.
magic_check ends up doing a READ() at line 6966 . READ() is a C macro.

For a file handle, READ() ends up at line 1183 where (PerlIO_read(cxt->fio, x, y) != (SSize_t)y) is done.

PerlIO_read is documented as returning the bytes read.

So, this would seem to come down to whether the call to IO::Compress::Base::Common::createSelfTiedObject produces a PerlIO-compatible file handle.

And, if as it appears, it does not, what can be done about it. I have the vague sense that perhaps implementing the file handle as a PerlIO layer rather than (or on the way to) a tied object might be a fruitful approach. See PerlIO::via But that's a guess. I'm not sure how you would hook up the handle you generate to provide the lowest IO layer.

Not being an XS or PerlIO person, this about exhausts my ability to diagnose what's going on. But hopefully you can take this to a helpful conclusion...

@tlhackque
Copy link
Author

Couldn't leave well enough alone. Layers seems to be on the right track.

This proof of concept works, and could be used as the basis of a patch to get IO::Uncompress::Unzip to produce a real file handle.

The toy example below adds a 'layer' command to the end of the previous reproducer. This provides a custom layer that returns the Storable hash. Storable::fd_retrieve from the file handle succeeds. Obviously, a real patch - which I won't try to do - would hook up FILL to the zip stream in the innards of IO::Uncompress, and would fill in the other via subroutines - SYSOPEN, TELL, SEEK, CLOSE come to mind, but I'm not exactly an expert. perldoc PerlIO::via has the full list - very similar to tie.

The second command inserts an intermediate file handle, with a similar layer, between IO::Uncompress::Unzip and Storable. This also works. So you have a path to a patch.

Here are the details:

To keep this short - replace the final die and exit in the previous code with this:

if( $cmd eq 'layer' ) {
    my $bar;
    open( my $file, '<:via(ME)', \$bar ) or die( "open $!\n" );
    my $read = fd_retrieve( $file );

    if( !$read || !exists $read->{key} ){
        print "Failed\n";
    } else {
        print "OK\n";
    }
    exit;
}
if( $cmd eq 'layereduncompress' ) {
    my $in = IO::Uncompress::Unzip->new( \*STDIN, { } ) or die( "IUU\n" );
    binmode( $in );

    open( my $via, '<:via(TOO)', $in) or die( "via: $!\n" );

    my $read = fd_retrieve( $via );
    if( !$read || !exists $read->{key} ){
        print "Failed\n";
    } else {
        print "OK\n";
    }
    exit;

}
die( "Bad command $cmd\n" );
exit;

# Packages implementing PerlIO layers using PerlIO::via

package ME;

my $eof;

sub PUSHED { my($class, $mode, $fh) = @_; bless \*PUSHED, $class };
sub OPEN { my($self, $path, $mode, $fh ) = @_; return 1; };
sub BINMODE { return 0 };

sub FILL { return if $eof; $eof = 1; return  pack 'C*', map { eval "0$_" } split( / /, '160 163 164 060 004 011 004 061 062 063 064 004 004 004 010 003 001 000 000 000 012 005 166 141 154 165 145 003 000 000 000 153 145 171'); };

1;
package TOO;

sub PUSHED { my($class, $mode, $fh) = @_; bless {}, $class };
sub OPEN { my($self, $path, $mode, $fh ) = @_; $self->{zip} = $path; return 1; };
sub BINMODE { return 0 };
sub FILL {
    my $self = shift;
    my $zip = $self->{zip};
    return <$zip>;
}
1;

The package ME in the first run returns the Storable from a literal.

In the second, the package TOO is opened with the file handle from IO::Uncompress::Unzip. It reads from that file handle, and returns whatever it gets. Storable likes reading from that too.

And in the third, we see that transparent mode thru IO::Uncompress::Unzip (and via(TOO) also works.

#./IUZ-Storable.pl layer
OK
# ./JUZ-Storable.pl layereduncompress <foo.zip
OK
#./JUZ-Storable.pl layereduncompress <foo
OK

@tlhackque tlhackque changed the title IO::Uncompress::Unzip stream can't be read by Storable IO::Uncompress::Unzip stream can't be read by Storable (PerlIO) - Demo showing how to fix Oct 26, 2021
@pmqs pmqs self-assigned this Oct 29, 2021
@pmqs
Copy link
Owner

pmqs commented Oct 29, 2021

Hi @tlhackque

my, you have been busy!

Not sure if the problem is with my code or with Storable, but I see you have already opened a ticket against Storable as well. I don't know the layers interface that well, so I will need to find some time to dig into this one a bit more to get a better understanding of the issue.

Also need to check if writing to a file with IO::Compress::* + Storable works.

@tlhackque
Copy link
Author

Thanks for the responses. Yes, it's been an adventure - a side trip from my real project.

I didn't investigate writes, as my project doesn't require them (yet, anyhow). Agree that they should also provide PerlIO filehandles. The layers API would allow that.

The issue is that Storable wants only a PerlIO handle. The XS code there doesn't understand tie'd FILEHANDLES - it calls directly into the PerlIO API to read.

This is a problem in both modules;

  • There are other consumers of IO::Uncompress::Unzip who also expect "real" file handles.
  • And Storable really should take anything that looks and acts like a FILEHANDLE.

So this really needs to be fixed in both places. Either would solve my immediate problem (I'm using the via() hack for now), but a complete solution for everyone (for all time) requires both.

Layers (I pointed you to the documentation on my little adventure) are just a sneaky way of providing a real PerlIO file handle instead of a tie'd object.

The short form of my little demo is that open() using the PerlIO::via mechanism produces a "real" PerlIO file handle. By intercepting the open, the layer (ususally used for code translation, line ending adjustment, and even for timestamps), can provide data from any source. It's quite similar to tie.

  • PUSHED is inside the open, just before the OPEN callback.
  • $path in the OPEN can be anything - including your object. The demo blesses a small hash (a scalar would do) since it acts as an intermediary.
  • BINMODE returns 0 to prevent the caller from removing this layer.
  • And FILL provides the data to the upper layers to buffer.

I knew layers exist - but this was my first adventure inside one - so it doesn't take too long to learn - despite the poor documentation. Obviously, I'm no expert. But let me know if I can help explain further.

@pmqs pmqs added the enhancement New feature or request label Oct 30, 2021
@tlhackque
Copy link
Author

I noted another victim of Storable's method of reading files in the issue I filed there.

So the good news is: this confirms that you're not alone. The bad news is that the the other victim reported the issue in 2006...and gave up in 2011.

Sigh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
@tlhackque @pmqs and others