SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings
What does it do?
This module performs a fast bitwise hamming distance of two hexadecimal strings.
This looks like:
DEADBEEF = 11011110101011011011111011101111 00000000 = 00000000000000000000000000000000 XOR = 11011110101011011011111011101111 Hamming = number of ones in DEADBEEF ^ 00000000 = 24
This essentially amounts to
>>> import gmpy >>> gmpy.popcount(0xdeadbeef ^ 0x00000000) 24
except with Python strings, so
>>> import gmpy >>> gmpy.popcount(int("deadbeef", 16) ^ int("00000000", 16)) 24
A few assumptions are made and enforced:
- this is a valid hexadecimal string (i.e.,
- the strings are the same length
- the strings do not begin with
Why yet another Hamming distance library?
There are a lot of fantastic (python) libraries that offer methods to calculate various edit distances, including Hamming distances: Distance, textdistance, scipy, jellyfish, etc.
In this case, I needed a hamming distance library that worked on hexadecimal strings (i.e., a Python
str) and performed blazingly fast. Furthermore, I often did not care about hex strings greater than 256 bits. That length constraint is different vs all the other libraries and enabled me to explore vectorization techniques via
Lastly, I wanted to minimize dependencies, meaning you do not need to install
Eventually, after playing around with
numpy, I decided to write what I wanted in essentially raw C. At this point, I'm using raw
int*, so exploring re-writing this in Fortran makes little sense.
To install, ensure you have Python 2.7 or 3.4+. Run
pip install hexhamming
or to install from source
git clone https://github.com/mrecachinas/hexhamming cd hexhamming python setup.py install # or pip install .
If you want to contribute to hexhamming, you should install the dev dependencies
pip install -r requirements-dev.txt
and make sure the tests pass with
python -m pytest -vls .
hexhamming is as simple as
>>> from hexhamming import hamming_distance_string >>> hamming_distance_string("deadbeef", "00000000") 24
New in v2.0.0 :
hexhamming now supports byte`s via ``hamming_distance_bytes`. You use it in the exact same way as before, except you pass in a byte string.
>>> from hexhamming import hamming_distance_bytes >>> hamming_distance_bytes(b"\xde\xad\xbe\xef", b"\x00\x00\x00\x00") 24
Below is a benchmark using
pytest-benchmark with hexhamming==v1.3.2 my 2020 2.0 GHz quad-core Intel Core i5 16 GB 3733 MHz LPDDR4 macOS Catalina (10.15.5) with Python 3.7.3 and Apple clang version 11.0.3 (clang-1188.8.131.52).
|Name||Mean (ns)||Std (ns)||Median (ns)||Rounds||Iterations|