Skip to content

Commit

Permalink
version 4.000
Browse files Browse the repository at this point in the history
  • Loading branch information
pmqs committed Feb 18, 2024
1 parent 168342b commit bd5834e
Show file tree
Hide file tree
Showing 7 changed files with 72 additions and 52 deletions.
20 changes: 20 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
CHANGES
-------

4.000 18 February 2024

* Major rewrite
* Make code more robust when it encounters a corrupt zip file.
Should mean less bombing-out with an unhelpful error message.
* Added "--walk" option as a less expensive alternative to "--scan".
* Added suport for Reference & Ipaq8 compression methods
* Decode more extra fields
* Display end offset in verbose mode.
* Detect & display aspects of pkzip strong encryption
* Added filename verification for portability, safety and compliamce with APPNOTE.txt
* Decode & display more DOS file attributes
* Decode & display Unix file attributes
* Detect issues with corrupt extra entries.
* Added support for Zip64 End Central Header version 2
* Add support for SPannded Archive Marker
* Detect inconsistencies between central & local directory entries
* Add a number of options for dealing with filename encoding.


2.108 22 July 2022

* Add binmode when opening the zip file Keeps Windows happy
Expand Down
8 changes: 4 additions & 4 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -963,9 +963,9 @@ t/files/0024-lpaq8/readme.md
t/files/0024-lpaq8/stdout
t/files/0024-lpaq8/stdout-v
t/files/0024-lpaq8/test.zip
t/files/0025-0x564B-KV-Pair/bad-signature/stdout-v
t/files/0025-0x564B-KV-Pair/bad-signature/stdout
t/files/0025-0x564B-KV-Pair/bad-signature/byte.tif.zip
t/files/0025-0x564B-KV-Pair/valid/stdout-v
t/files/0025-0x564B-KV-Pair/bad-signature/stdout
t/files/0025-0x564B-KV-Pair/bad-signature/stdout-v
t/files/0025-0x564B-KV-Pair/valid/byte.tif.zip
t/files/0025-0x564B-KV-Pair/valid/stdout
t/files/0025-0x564B-KV-Pair/valid/byte.tif.zip
t/files/0025-0x564B-KV-Pair/valid/stdout-v
2 changes: 1 addition & 1 deletion META.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,6 @@
"web" : "https://github.com/pmqs/IO-Compress"
}
},
"version" : "2.200",
"version" : "4.000",
"x_serialization_backend" : "JSON::PP version 2.97001"
}
2 changes: 1 addition & 1 deletion META.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,5 @@ resources:
bugtracker: https://github.com/pmqs/zipdetails/issues
homepage: https://github.com/pmqs/zipdetails
repository: git://github.com/pmqs/zipdetails.git
version: '2.200'
version: '4.000'
x_serialization_backend: 'CPAN::Meta::YAML version 0.018'
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ files. For each item of metadata within a zip file the program will output
- an optional hex dump of the item.

The program assumes a prior understanding of the internal structure of Zip
files. You should have a copy of the Zip
[APPNOTE.TXT](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) file
files. You should have a copy of the zip file definition,
[APPNOTE.TXT](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT),
at hand to help understand the output from this program.

## Default Behaviour

By default the program expects to be given a well-formed zip file. It will
navigate the Zip file by first parsing the zip central directory at the end
of the file. If the central directory is found, it will then walk
navigate the zip file by first parsing the zip `Central Directory` at the end
of the file. If the `Central Directory` is found, it will then walk
sequentally through the zip records starting at the beginning of the file.
See ["Advanced Analysis"](#advanced-analysis) for other processing options.

Expand All @@ -57,7 +57,7 @@ be determined `cp437` will be used.

The exceptions are

- when the _Language Encoding flag_ is set in the zip file, the
- when the `Language Encoding Flag` is set in the zip file, the
filename/comment fields are assumed to be encoded in UTF-8.
- the definition for the metadata field implies UTF-8 charset encoding

Expand Down Expand Up @@ -128,7 +128,7 @@ See ["Filename Encoding Issues"](#filename-encoding-issues)
Modern zip files set a metadata entry in zip files, called the "Language
encoding flag", when they write filenames/comments encoded in UTF-8.

Occasionally some applications set the "Language encoding flag" but write
Occasionally some applications set the `Language Encoding Flag` but write
data that is not UTF-8 in the filename/comment fields of the zip file. This
will usually result in garbled text being output for the
filenames/comments.
Expand Down Expand Up @@ -173,12 +173,12 @@ the type of data:
If the field contains an 8-bit, 16-bit, 32-bit or 64-bit numeric value, it
will be displayed in both hex and decimal -- for example "`002A (42)`".

Note that Zip files store most numeric values in _little-endian_ encoding.
Note that Zip files store most numeric values in _little-endian_ encoding
(there area few rare instances where _big-endian_ is used). The value read
from the zip file will have the _endian_ encoding removed before being
displayed.

Next, is an optional description of what the value means.
Next, is an optional description of what the numeric value means.

- String

Expand Down Expand Up @@ -332,12 +332,12 @@ Here is the same zip file, `test.zip`, dumped using the `zipdetails`
## Advanced Analysis

If you have a corrupt or non-standard zip file, particulatly one where the
central directory metadata at the end of the file is absent/incomplete, you
`Central Directory` metadata at the end of the file is absent/incomplete, you
can use either the `--walk` option or the `--scan` option to search for
any zip metadata that is still present in the file.

When either of these options is enabled, this program will bypass the
initial step of reading the central directory at the end of the file and
initial step of reading the `Central Directory` at the end of the file and
simply scan the zip file sequentially from the start of the file looking
for zip metedata records. Although this can be error prone, for the most
part it will find any zip file metadata that is still present in the file.
Expand Down Expand Up @@ -394,18 +394,18 @@ was well. If not, you had to post-process the filenames after unzipping the
zip file.

Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the "Language
encoding flag" (also known as "EFS") in the zip file metadata to signal
approach now used by all major zip implementations is to set the `Language
encoding flag` (also known as `EFS`) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.

To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the "EFS" bit needs to
be set and the filename gets stored in UTF-8. Although this rule is kept to
filenames as-is in the zip file. For anything else the `EFS` bit needs to
be set and the filename is encoded in UTF-8. Although this rule is kept to
for the most part, there are exceptions out in the wild.

### Dealing with Encoding Errors

The most common filename encoding issue is where the EFS bit is not set and
The most common filename encoding issue is where the `EFS` bit is not set and
the filename is stored in a character set that doesnt't match the system
encoding. This mostly impacts legacy zip files that predate the
introduction of Unicode.
Expand All @@ -416,7 +416,7 @@ can display the filenames using the `--encoding` option

zipdetails --encoding ISO-8859-1 myfile.zip

A less common variation of this is where the EFS bit is set, signalling
A less common variation of this is where the `EFS` bit is set, signalling
that the filename will be encoded in UTF-8, but the filename is not encoded
in UTF-8. To deal with this scenarion, use the `--no-language-encoding`
option along with the `--encoding` option.
Expand All @@ -436,10 +436,10 @@ The following zip file features are not supported by this program:

- Encrypted Central Directory

When pkzip _strong encryption_ is enabled in a zip file this program can
When pkzip _Strong Encryption_ is enabled in a zip file this program can
still parse most of the metadata in the zip file. The exception is when the
Central Directory of a zip file is also encrypted. This program cannot
parse any metadata from an encrypted Central Directory.
`Central Directory` of a zip file is also encrypted. This program cannot
parse any metadata from an encrypted `Central Directory`.

- Corrupt Zip files

Expand All @@ -455,13 +455,13 @@ The following zip file features are not supported by this program:

# TODO

## JSON Output
## JSON/YML Output

Output some of the zip file metadata as a JSON document.
Output some of the zip file metadata as a JSON or YML document.

## Corrupt Zip files

Although the detection and reporting of common corruption use-cases is
Although the detection and reporting of most of the common corruption use-cases is
present in `zipdetails`, there are likely to be other edge cases that need
to be supported.

Expand Down
46 changes: 23 additions & 23 deletions bin/zipdetails
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ use Encode;
use Getopt::Long;
use List::Util qw(min max);

my $VERSION = '3.005' ;
my $VERSION = '4.000' ;

sub fatal_tryWalk;
sub fatal_truncated ;
Expand Down Expand Up @@ -7390,15 +7390,15 @@ files. For each item of metadata within a zip file the program will output
The program assumes a prior understanding of the internal structure of Zip
files. You should have a copy of the Zip
L<APPNOTE.TXT|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT> file
files. You should have a copy of the zip file definition,
L<APPNOTE.TXT|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT>,
at hand to help understand the output from this program.
=head2 Default Behaviour
By default the program expects to be given a well-formed zip file. It will
navigate the Zip file by first parsing the zip central directory at the end
of the file. If the central directory is found, it will then walk
navigate the zip file by first parsing the zip C<Central Directory> at the end
of the file. If the C<Central Directory> is found, it will then walk
sequentally through the zip records starting at the beginning of the file.
See L<Advanced Analysis> for other processing options.
Expand Down Expand Up @@ -7429,7 +7429,7 @@ The exceptions are
=item *
when the I<Language Encoding flag> is set in the zip file, the
when the C<Language Encoding Flag> is set in the zip file, the
filename/comment fields are assumed to be encoded in UTF-8.
=item *
Expand Down Expand Up @@ -7511,7 +7511,7 @@ default the system encoding will be used.
Modern zip files set a metadata entry in zip files, called the "Language
encoding flag", when they write filenames/comments encoded in UTF-8.
Occasionally some applications set the "Language encoding flag" but write
Occasionally some applications set the C<Language Encoding Flag> but write
data that is not UTF-8 in the filename/comment fields of the zip file. This
will usually result in garbled text being output for the
filenames/comments.
Expand Down Expand Up @@ -7576,12 +7576,12 @@ the type of data:
If the field contains an 8-bit, 16-bit, 32-bit or 64-bit numeric value, it
will be displayed in both hex and decimal -- for example "C<002A (42)>".
Note that Zip files store most numeric values in I<little-endian> encoding.
Note that Zip files store most numeric values in I<little-endian> encoding
(there area few rare instances where I<big-endian> is used). The value read
from the zip file will have the I<endian> encoding removed before being
displayed.
Next, is an optional description of what the value means.
Next, is an optional description of what the numeric value means.
=item * String
Expand Down Expand Up @@ -7761,12 +7761,12 @@ C<-v> option:
=head2 Advanced Analysis
If you have a corrupt or non-standard zip file, particulatly one where the
central directory metadata at the end of the file is absent/incomplete, you
C<Central Directory> metadata at the end of the file is absent/incomplete, you
can use either the C<--walk> option or the C<--scan> option to search for
any zip metadata that is still present in the file.
When either of these options is enabled, this program will bypass the
initial step of reading the central directory at the end of the file and
initial step of reading the C<Central Directory> at the end of the file and
simply scan the zip file sequentially from the start of the file looking
for zip metedata records. Although this can be error prone, for the most
part it will find any zip file metadata that is still present in the file.
Expand Down Expand Up @@ -7823,18 +7823,18 @@ was well. If not, you had to post-process the filenames after unzipping the
zip file.
Fast forward to the introduction of Unicode and UTF-8 encoding. The
approach now used by all major zip implementations is to set the "Language
encoding flag" (also known as "EFS") in the zip file metadata to signal
approach now used by all major zip implementations is to set the C<Language
encoding flag> (also known as C<EFS>) in the zip file metadata to signal
that a filename/comment is encoded in UTF-8.
To ensure maximum interoperability when sharing zip files store 7-bit
filenames as-is in the zip file. For anything else the "EFS" bit needs to
be set and the filename gets stored in UTF-8. Although this rule is kept to
filenames as-is in the zip file. For anything else the C<EFS> bit needs to
be set and the filename is encoded in UTF-8. Although this rule is kept to
for the most part, there are exceptions out in the wild.
=head3 Dealing with Encoding Errors
The most common filename encoding issue is where the EFS bit is not set and
The most common filename encoding issue is where the C<EFS> bit is not set and
the filename is stored in a character set that doesnt't match the system
encoding. This mostly impacts legacy zip files that predate the
introduction of Unicode.
Expand All @@ -7845,7 +7845,7 @@ can display the filenames using the C<--encoding> option
zipdetails --encoding ISO-8859-1 myfile.zip
A less common variation of this is where the EFS bit is set, signalling
A less common variation of this is where the C<EFS> bit is set, signalling
that the filename will be encoded in UTF-8, but the filename is not encoded
in UTF-8. To deal with this scenarion, use the C<--no-language-encoding>
option along with the C<--encoding> option.
Expand Down Expand Up @@ -7873,10 +7873,10 @@ detected and some will only contain compressed payload data.
Encrypted Central Directory
When pkzip I<strong encryption> is enabled in a zip file this program can
When pkzip I<Strong Encryption> is enabled in a zip file this program can
still parse most of the metadata in the zip file. The exception is when the
Central Directory of a zip file is also encrypted. This program cannot
parse any metadata from an encrypted Central Directory.
C<Central Directory> of a zip file is also encrypted. This program cannot
parse any metadata from an encrypted C<Central Directory>.
=item *
Expand Down Expand Up @@ -7908,13 +7908,13 @@ corruption.
=head1 TODO
=head2 JSON Output
=head2 JSON/YML Output
Output some of the zip file metadata as a JSON document.
Output some of the zip file metadata as a JSON or YML document.
=head2 Corrupt Zip files
Although the detection and reporting of common corruption use-cases is
Although the detection and reporting of most of the common corruption use-cases is
present in C<zipdetails>, there are likely to be other edge cases that need
to be supported.
Expand Down
2 changes: 1 addition & 1 deletion lib/App/zipdetails.pm
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package App::zipdetails;

our $VERSION = '3.005' ;
our $VERSION = '4.000' ;

=head1 NAME
Expand Down

0 comments on commit bd5834e

Please sign in to comment.