Releases: Bulat-Ziganshin/DataSmoke
Releases · Bulat-Ziganshin/DataSmoke
2-pass 32/64-bit coverage smokers and -bBUFSIZE option
Changes:
- Added "2-pass DWord/QWord coverage" smokers that does the same as 1-pass coverage smoker but with automatic selection of the most populated sector
- The "DWord hash entropy" smoker, of course absolutely useless
- -bBUFSIZE option selects size of analyzed blocks: -b64k, -b4m/-b4, -b1g
- print numbers of first 10 incompressible blocks
- hashing uses SSE 4.2 crc32c instruction in order to make distribution more fair
Tuned DWord coverage algorithm plus printing amount of incompressible blocks
Changes:
- DWord coverage modifed to use STEP=1 (required to detect duplicated compressed data) and HASHSIZE = 2*mb (required for more precise coverage computation, previously it was 88% on random data)
- print amount of incompressible blocks (entropy or coverage >95%)
- display results as Markdown-friendly tables
- calculate min/max entropy only on complete 4MB blocks, so average entropy now may be less than minimal or larger than maximal 😆
- print everything to stdout instead of stderr
- substantially updated README
Initial implementations of byte/word/dword/order1 smokers
The full list of smells (speeds measured on the single core of i7-4770):
- ByteSmoker: computes entropy of individual bytes (2 GB/s).
- WordSmoker: computes entropy of 16-bit words (0.7-1.5 GB/s).
- DWordSmoker: computes entropy of 32-bit dwords (3 GB/s).
- Order1Smoker: computes order-1 entropy of 8-bit bytes (0.7-1.5 GB/s).
And examples of their work:
Text file (enwik9):
- ByteSmoker entropy: minimum 62.68%, average 64.20%, maximum 66.97%
- WordSmoker entropy: minimum 53.14%, average 55.97%, maximum 57.93%
- Order1Smoker entropy: minimum 42.43%, average 47.75%, maximum 48.88%
- DWordSmoker entropy: minimum 4.14%, average 10.37%, maximum 16.01%
Binary file:
- ByteSmoker entropy: minimum 48.49%, average 77.67%, maximum 93.62%
- WordSmoker entropy: minimum 33.09%, average 68.74%, maximum 92.00%
- Order1Smoker entropy: minimum 17.69%, average 59.81%, maximum 90.39%
- DWordSmoker entropy: minimum 1.78%, average 31.92%, maximum 92.00%
Compressed file:
- ByteSmoker entropy: minimum 100.00%, average 100.00%, maximum 100.00%
- WordSmoker entropy: minimum 99.75%, average 99.93%, maximum 99.93%
- Order1Smoker entropy: minimum 99.49%, average 99.86%, maximum 99.86%
- DWordSmoker entropy: minimum 96.20%, average 96.95%, maximum 98.04%