Pearson Hashing

This repository contains a custom implementation of a "Fast Hashing of Variable-Length Text Strings" as suggested by Peter K. Pearson in The Communications of the ACM Vol.33, No. 6 (June 1990), pp. 677-680 – the so called Pearson Hashing.

The Wikipedia article contains a good description. Also, it links to a .pdf copy the original paper.

The underlying principle of this algorithm is easy to understand. Simplicity shines!

Note that this is not a cryptographicly secure hashing function, so it cannot unencryptedly be used to prove knowledge of a secret without needing to present that secret. But it is well suited for keying hash tables or to compress or expand given entropy of some byte-string to a different number of bytes or just as a checksum. Its design also allows to verify smaller parts of the hash value if required.

For use as checksum of to-be-encrypted data (including the checksum), it is noteworthy that the random permutation as repeating but especially final step makes Pearson Hashing even a better choice than CRC – especially with a view to exclusive-ored stream ciphers.

You also might want to take a look at Pearson Block Hashing, an extension to the original Pearson Hashing in terms of block width even passing SMHasher.

Thoughts

To get to output bit-widths beyond eight, Pearson describes an extension scheme which uses different starting values. Unfortunately, output will never contain the same two bytes in the various positions then – diminishing the codomain. This can be avoided by using a different permutation per position (output byte). In that case, there is no need for different starting values anymore.

Not to use another look-up-table for each and every output byte, this implementation changes every byte by a constant value – different for each position – just before the look-up. If the table is random (and not arithmetically generated), it will seem to be a different permutation (from the new-index-look-up point of view). Remember, this is all modulo 256. After the look-up, the applied change could be reversed but it does not have to (gaining some speed).

Talking about speed, it seems to be feasible to use exclusive-or to change the to-be-looked-up index. It was Peter K. Pearson – with whom I had a short exchange of eMails on getting permission for use – who pointed out that this is exactly like the key-whitening as found in DES-X for example. By using exclusive-or, a big advantage can be assumed for parallel implementation in regular C then (if not using SSE's vetorized bytewise operations).

In a hunt for more speed, a quick test using a linear congruential generator with full periodicity of 256 (trying to avoid the look-up in memory) resulted in uneven-looking distributed output. I am not able to see that my parameters were badly chosen and I stopped these test maybe too soon, but for now, this road does not want to be gone down. Thus, sticking to a look-up table for now.

However, if system memory permits, the lookup-table could be extended to 16 bit width which will require 2¹⁶ × 16 bits = 128K bytes of memory.

The hasing function could easily be customized or tweaked (here, I would avoid the term keyed) just by changing the permutation along some provided tweak. A possible solution would be to use a tweak as seed for a pseudo-random number generator (maybe not a linear congruential one…) and repeatedly switch any two values in the table indicated by the pseduo-random number generator. Good thing here: The setup happens once before use, it does not impact the actual hashing speedwise. Soon, I will change this implementation to use AES' S-Box as permutation to eventually take advantage of hardware accelerated AES instructions – first trials look much more promising than the linear-congruentially computations. [done]

The hash value equals the innner hashing function state so it can easily be updated whenever further to-be-hashed input data comes along, e.g. in streaming applications.

This Implementation

The provided code follows the above-elaborated thoughts and offers two different flavors:

One outputs a 128-bit or 256-bit hash value to memory for any hashed string. It is endian-aware. Also, a SSE version is included, just compile using -march=native. The 256-bit might see an AVX-imeplementation in the future, probably not the 128-bit hashing as the shuffle is not getting wider (bitwise) and thus does not speed up the table look-up. SSE implementation could turn out slightly slower on some more modern machines than plain C version compiled using -Ofast. The reason probably is some CPU feature speed-up or so that came up somewhen between Sandy Bridge and Haswell – I was not able to nail it down more precisely. NEON might follow at a later point in time, too.

The other one returns a 32-bit hash of type uint32_t. Those bits are identical to the last 32 bit of the 128-bit and 256-bit hash. As it is a regular return value of a function call, its further use must happen endian-aware. Also, the return value could be used as 'hash' parameter for another call to update the hash while hashing more of an input.

The fully compiled tool using gcc -O3 test.c pearson.c or gcc -O3 -march=native test.c pearson.c, respectively, shows the following speeds when run (./a.out):

i7 2860QM	8-bit hash	16-bit hash	32-bit hash	128-bit hash	256-bit hash
plain	182.7 MB/s	134.6 MB/s	103.7 MB/s	24.6 MB/s	14.2 MB/s
O3	358.5 MB/s	255.5 MB/s	187.4 MB/s	106.8 MB/s	65.6 MB/s
O3 & SSE	349.2 MB/s	249.1 MB/s	184.1 MB/s	77.5 MB/s	39.2 MB/s
O3 & low mem	356.1 MB/s	294.2 MB/s	164.3 MB/s	72.9 MB/s	40.8 MB/s
SSE & AES-NI				304.7 MB/s

i7 7500U	8-bit hash	16-bit hash	32-bit hash	128-bit hash	256-bit hash
plain	242.6 MB/s	169.5 MB/s	124.7 MB/s	33.5 MB/s	18.9 MB/s
O3	413.4 MB/s	259.3 MB/s	201.2 MB/s	134.2 MB/s	107.5 MB/s
O3 & SSE	413.3 MB/s	260.1 MB/s	206.3 MB/s	86.8 MB/s	48.2 MB/s
O3 & low mem	412.9 MB/s	302.5 MB/s	221.0 MB/s	127.0 MB/s	69.6 MB/s
SSE & AES-NI				379.9 MB/s

Cortex A53	8-bit hash	16-bit hash	32-bit hash	128-bit hash	256-bit hash
plain	40.4 MB/s	24.6 MB/s	17.3 MB/s	3.8 MB/s	1.9 MB/s
O3	190.2 MB/s	66.9 MB/s	39.9 MB/s	13.6 MB/s	7.4 MB/s
O3 & low mem	190.2 MB/s	78.4 MB/s	63.5 MB/s	33.7 MB/s	19.8 MB/s

Note that all CPUs calculate the 16-bit and 32-bit hashes faster without the 16-bit look-up table when compiled with -D LOW_MEM_FOOTPRINT.

An obvious recomendation would be to always use compiler optimizations -O3 and in some cases (ARM) also -D LOW_MEM_FOOTPRINT.

SSE does not seem to be well suited for a 256 byte table look-up. Could NEON (not implemented yet) do better?

Contribute

You are welcome to help with NEON or AVX implementation or any other contribution that would speed-up things. Maybe multi-threading? Don't be shy and leave an Issue or a Pull Request!

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
LICENSE		LICENSE
README.md		README.md
pearson.c		pearson.c
pearson.h		pearson.h
test.c		test.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pearson Hashing

Thoughts

This Implementation

Contribute

About

Releases

Packages

Contributors 2

Languages

License

Logan007/pearson

Folders and files

Latest commit

History

Repository files navigation

Pearson Hashing

Thoughts

This Implementation

Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages