scs4onnx

A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible. Simple Constant value Shrink for ONNX.

Key concept

If the same constant tensor is found by scanning the entire graph for Constant values, it is aggregated into a single constant tensor.
Ignore scalar values.
Ignore variables.

1. Setup

### option
$ echo export PATH="~/.local/bin:$PATH" >> ~/.bashrc \
&& source ~/.bashrc

### run
$ pip install -U onnx \
&& python3 -m pip install -U onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com \
&& pip install -U scs4onnx

2. Usage

$ scs4onnx -h

usage: scs4onnx [-h] [--mode {shrink,npy}] [--non_verbose] input_onnx_file_path output_onnx_file_path

positional arguments:
  input_onnx_file_path
                        Input onnx file path.
  output_onnx_file_path
                        Output onnx file path.

optional arguments:
  -h, --help            show this help message and exit
  --mode {shrink,npy}   Constant Value Compression Mode.
                        shrink: Share constant values inside the model as much as possible.
                                The model size is slightly larger because
                                some shared constant values remain inside the model,
                                but performance is maximized.
                        npy:    Outputs constant values used repeatedly in the model to an
                                external file .npy. Instead of the smallest model body size,
                                the file loading overhead is greater.
                        Default: shrink
  --non_verbose         Do not show all information logs. Only error logs are displayed.

3. CLI Execution

$ scs4onnx input.onnx output.onnx --mode shrink

4. In-script Execution

from scs4onnx import shrinking

shrunk_graph, npy_file_paths = shrinking('input.onnx', 'output.onnx', mode='npy')

5. Sample

5-1. `shrink` mode sample

297.8MB -> 67.4MB

5-2. `npy` mode sample

297.8MB -> 21.3MB

5-3. `.npy` file view

$ python
>>> import numpy as np
>>> param = np.load('gmflow_sintel_480x640_shrunken_exported_1646.npy')
>>> param.shape
(8, 1200, 1200)
>>> param
array([[[   0.,    0.,    0., ...,    0.,    0.,    0.],
        [   0.,    0.,    0., ...,    0.,    0.,    0.],
        [   0.,    0.,    0., ...,    0.,    0.,    0.],
        ...,
        [-100., -100., -100., ...,    0.,    0.,    0.],
        [-100., -100., -100., ...,    0.,    0.,    0.],
        [-100., -100., -100., ...,    0.,    0.,    0.]]], dtype=float32)

6. Reference

GitHub

View Github

A very simple tool that compresses the overall size of the ONNX model by aggregating duplicate constant values as much as possible

scs4onnx

Key concept

1. Setup

2. Usage

3. CLI Execution

4. In-script Execution

5. Sample

5-1. `shrink` mode sample

5-2. `npy` mode sample

5-3. `.npy` file view

6. Reference

GitHub

John

Analysing and storing r/Place 2022 event

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

scs4onnx

Key concept

1. Setup

2. Usage

3. CLI Execution

4. In-script Execution

5. Sample

5-1. shrink mode sample

5-2. npy mode sample

5-3. .npy file view

6. Reference

GitHub

Analysing and storing r/Place 2022 event

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

You might also like...

5-1. `shrink` mode sample

5-2. `npy` mode sample

5-3. `.npy` file view