Convert Apple NeuralHash model for CSAM Detection to ONNX.
Apple NeuralHash is a perceptual hashing method for images based on neural networks. It can tolerate image resize and compression. The steps of hashing is as the following:
- Convert image to RGB.
- Resize image to
- Normalize RGB values to
- Perform inference on the NeuralHash model.
- Calculate dot product of a
96x128matrix with the resulting vector of 128 floats.
- Apply binary step to the resulting 96 float vector.
- Convert the vector of 1.0 and 0.0 to bits, resulting in 96-bit binary data.
In this project, we convert Apple's NeuralHash model to ONNX format. A demo script for testing the model is also included.
Both macOS and Linux will work. In the following sections Debian is used for Linux example.
- macOS: Install by running
brew install lzfse.
- Linux: Build and install from lzfse source.
Python 3.6 and above should work. Install the following dependencies:
pip install onnx coremltools
Step 1: Get NeuralHash model
You will need 4 files from a recent macOS or iOS build:
Option 1: From macOS or jailbroken iOS device (Recommended)
If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from
/System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or
/System/Library/Frameworks/Vision.framework/ (on iOS).
Option 2: From iOS IPSW (click to reveal)
- Download any
.ipswof a recent iOS build (14.7+) from ipsw.me.
- Unpack the file:
cd /path/to/ipsw/file mkdir unpacked_ipsw cd unpacked_ipsw unzip ../*.ipsw
- Locate system image:
What you need is the largest
.dmg file, for example
- Mount system image. On macOS simply open the file in Finder. On Linux run the following commands:
# Build and install apfs-fuse sudo apt install fuse libfuse3-dev bzip2 libbz2-dev cmake g++ git libattr1-dev zlib1g-dev git clone https://github.com/sgan81/apfs-fuse.git cd apfs-fuse git submodule init git submodule update mkdir build cd build cmake .. make sudo make install sudo ln -s /bin/fusermount /bin/fusermount3 # Mount image mkdir rootfs apfs-fuse 018-63036-003.dmg rootfs
Required files are under
/System/Library/Frameworks/Vision.framework/ in mounted path.
Put them under the same directory:
mkdir NeuralHash cd NeuralHash cp /System/Library/Frameworks/Vision.framework/Resources/NeuralHashv3b-current.espresso.* . cp /System/Library/Frameworks/Vision.framework/Resources/neuralhash_128x96_seed1.dat .
Step 2: Decode model structure and shapes
Normally compiled Core ML models store structure in
model.espresso.net and shapes in
model.espresso.shape, both in JSON. It's the same for NeuralHash model but compressed with LZFSE.
dd if=NeuralHashv3b-current.espresso.net bs=4 skip=7 | lzfse -decode -o model.espresso.net dd if=NeuralHashv3b-current.espresso.shape bs=4 skip=7 | lzfse -decode -o model.espresso.shape cp NeuralHashv3b-current.espresso.weights model.espresso.weights
Step 3: Convert model to ONNX
cd .. git clone https://github.com/AsuharietYgvar/TNN.git cd TNN python3 tools/onnx2tnn/onnx-coreml/coreml2onnx.py ../NeuralHash
The resulting model is
Netron is a perfect tool for this purpose.
Calculate neural hash with onnxruntime
- Install required libraries:
pip install onnxruntime pillow
nnhash.pyon an image:
python3 nnhash.py /path/to/model.onnx /path/to/neuralhash_128x96_seed1.dat image.jpg
Note: Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.
|iPad Pro 10.5-inch||