[TOC]

概述

本文主要介绍QNN各种SDK工具和功能。

qnn-onnx-converter

qnn-onnx-converter 工具将模型从 ONNX 框架转换为 CPP 文件,该文件将模型表示为一系列 QNN API 调用。此外,还会生成包含模型静态权重的二进制文件。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
usage: qnn-onnx-converter [--out_node OUT_NAMES]
[--input_type INPUT_NAME INPUT_TYPE]
[--input_dtype INPUT_NAME INPUT_DTYPE]
[--input_encoding INPUT_NAME INPUT_ENCODING]
[--input_layout INPUT_NAME INPUT_LAYOUT]
[--debug [DEBUG]] [--dry_run [DRY_RUN]]
[-d INPUT_NAME INPUT_DIM]
[--batch BATCH_DIM]
[--define_symbol SYMBOL_NAME VALUE]
[--disable_batchnorm_folding]
[--quantization_overrides QUANTIZATION_OVERRIDES]
[--keep_quant_nodes]
[--keep_disconnected_nodes]
[--input_list INPUT_LIST]
[--param_quantizer PARAM_QUANTIZER]
[--act_quantizer ACT_QUANTIZER]
[--algorithms ALGORITHMS [ALGORITHMS ...]]
[--bias_bw BIAS_BW] [--act_bw ACT_BW]
[--weight_bw WEIGHT_BW] [--ignore_encodings]
[--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]]
--input_network INPUT_NETWORK [-h]
[-o OUTPUT_PATH]
[--copyright_file COPYRIGHT_FILE]
[--overwrite_model_prefix]
[--exclude_named_tensors]
[--op_package_lib OP_PACKAGE_LIB]
[-p PACKAGE_NAME | --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]]

用于将 ONNX 模型转换为 QNN 的脚本,参数说明:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
required arguments:
--input_network INPUT_NETWORK, -i INPUT_NETWORK ONNX模型的地址路径

optional arguments:
--out_node OUT_NODE Name of the graph's output nodes. Multiple output
nodes should be provided separately like: --out_node
out_1 --out_node out_2
--input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
Type of data expected by each input op/layer. Type for
each input is |default| if not specified. For example:
"data" image.Note that the quotes should always be
included in order to handle special characters,
spaces,etc. For multiple inputs specify multiple
--input_type on the command line. Eg: --input_type
"data1" image --input_type "data2" opaque These
options get used by DSP runtime and following
descriptions state how input will be handled for each
option. Image: Input is float between 0-255 and the
input's mean is 0.0f and the input's max is 255.0f. We
will cast the float to uint8ts and pass the uint8ts to
the DSP. Default: Pass the input as floats to the dsp
directly and the DSP will quantize it. Opaque: Assumes
input is float because the consumer layer(i.e next
layer) requires it as float, therefore it won't be
quantized. Choices supported: image default opaque
--input_dtype INPUT_NAME INPUT_DTYPE
指定网络输入层的名称和数据类型 以[input_name 数据类型] 格式指定,用于示例:'data' 'float32'。如果未指定,默认值为float32。请注意,引号里面可以包括特殊字符、空格等。对于多个输入,请指定多个--input_dtype在命令行上,如下所示: --input_dtype 'data1' 'float32' --input_dtype 'data2' 'float32'

--input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
Image encoding of the source images. Default is bgr.
Eg usage: "data" rgba Note the quotes should always be
included in order to handle special characters,
spaces, etc. For multiple inputs specify
--input_encoding for each on the command line. Eg:
--input_encoding "data1" rgba --input_encoding "data2"
other Use options: color encodings(bgr,rgb, nv21...)
if input is image; time_series: for inputs of rnn
models; other: if input doesn't follow above
categories or is unknown. Choices supported: bgr rgb
rgba argb32 nv21 time_series other
--debug [DEBUG] Run the converter in debug mode.
--input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
Layout of each input tensor. If not specified, it will use the default
based on the Source Framework, shape of input and input encoding.
Accepted values are-
NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
NDHWC/NCDHW used for 5d inputs
NHWC/NCHW used for 4d image-like inputs
NFC/NCF used for inputs to Conv1D or other 1D ops
NTF/TNF used for inputs with time steps like the ones used for LSTM op
NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
F used for 1D inputs, e.g. Bias tensor
NONTRIVIAL for everything elseFor multiple inputs specify multiple
--input_layout on the command line.
Eg:
--input_layout "data1" NCHW --input_layout "data2" NCHW
--dry_run [DRY_RUN] Evaluates the model without actually converting any
ops, and returns unsupported ops/attributes as well as
unused inputs and/or outputs if any. Leave empty or
specify "info" to see dry run as a table, or specify
"debug" to show more detailed messages only"
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The name and dimension of all the input buffers to the
network specified in the format [input_name comma-
separated-dimensions], for example: 'data'
1,224,224,3. Note that the quotes should always be
included in order to handle special characters,
spaces, etc. NOTE: This feature works only with Onnx
1.6.0 and above
-n, --no_simplification
Do not attempt to simplify the model automatically.
This may prevent some models from properly converting
when sequences of unsupported static operations are
present.
-b BATCH, --batch BATCH
The batch dimension override. This will take the first dimension of all
inputs and treat it as a batch dim, overriding it with the value provided
here. For example:
--batch 6
will result in a shape change from [1,3,224,224] to [6,3,224,224].
If there are inputs without batch dim this should not be used and each input
should be overridden independently using -d option for input dimension
overrides.
-s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
This option allows overriding specific input dimension symbols. For instance
you might see input shapes specified with variables such as :
data: [1,3,height,width]
To override these simply pass the option as:
--define_symbol height 224 --define_symbol width 448
which results in dimensions that look like:
data: [1,3,224,448]
--disable_batchnorm_folding
如果未指定,则converter会尝试讲BN层折叠到前一层。
--keep_disconnected_nodes
Disable Optimization that removes Ops not connected to the main graph.
This optimization uses output names provided over commandline OR
inputs/outputs extracted from the Source model to determine the main graph
-h, --help
输出帮助信息,并且退出程序
-o OUTPUT_PATH, --output_path OUTPUT_PATH
转换后的输出模型应位于的保存路径。如果未指定,将输出当前路径下与输入模型同名的文件

--copyright_file COPYRIGHT_FILE
版权文件的路径。如果提供,内容 该文件将被添加到输出模型中。
--overwrite_model_prefix
If option passed, model generator will use the output
path name to use as model prefix to name functions in
<qnn_model_name>.cpp. (Useful for running multiple
models at once) eg: ModelName_composeGraphs. Default
is to use generic "QnnModel_".
--exclude_named_tensors
Remove using source framework tensorNames; instead use
a counter for naming tensors. Note: This can
potentially help to reduce the final model library
that will be generated(Recommended for deploying
model). Default is False.

Quantizer Options:
--quantization_overrides QUANTIZATION_OVERRIDES
Use this option to specify a json file with parameters
to use for quantization. These will override any
quantization data carried from conversion (eg TF fake
quantization) or calculated during the normal
quantization process. Format defined as per AIMET
specification.
--keep_quant_nodes
使用此选项可以指定保留网络途中的激活量化节点,而不是剥离它们。
--input_list INPUT_LIST
指定输入数据的文件的路径。此文件应为纯文本文件,包含一个或多个 每行的绝对文件路径。每条路径都应 指向包含RAW格式数据,可供量化器使用 无需任何进一步的预处理(量化之后的RAW数据)。每个文件多个空格分隔的行表示多个输入网络。有关更多详细信息,请参阅文档。如果需要量化的时候,必须指定以进行量化。如果未提供input_list文件,则会忽略所有后续量化选项。

--param_quantizer PARAM_QUANTIZER
Optional parameter to indicate the weight/bias
quantizer to use. Must be followed by one of the
following options: "tf": Uses the real min/max of the
data and specified bitwidth (default) "enhanced": Uses
an algorithm useful for quantizing models with long
tails present in the weight distribution "adjusted":
Uses an adjusted min/max for computing the range,
particularly good for denoise models "symmetric":
Ensures min and max have the same absolute values
about zero. Data will be stored as int#_t data such
that the offset is always 0.
--act_quantizer ACT_QUANTIZER
指示激活的可选参数 要使用的量化器。必须后跟其中一个,以下选项:"tf": Uses the real min/max of the data and specified bitwidth (default) "enhanced": Uses an algorithm useful for quantizing models with long tails present in the weight distribution "adjusted": Uses an adjusted min/max for computing the range, particularly good for denoise models "symmetric": Ensures min and max have the same absolute values about zero. Data will be stored as int#_t data such that the offset is always 0.

--algorithms ALGORITHMS [ALGORITHMS ...]
Use this option to enable new optimization algorithms.
Usage is: --algorithms <algo_name1> ... The available
optimization algorithms are: "cle" - Cross layer
equalization includes a number of methods for
equalizing weights and biases across layers in order
to rectify imbalances that cause quantization errors.
"bc" - Bias correction adjusts biases to offset
activation quantization errors. Typically used in
conjunction with "cle" to improve quantization
accuracy.
--bias_bw BIAS_BW Use the --bias_bw option to select the bitwidth to use
when quantizing the biases, either 8 (default) or 32.
--act_bw ACT_BW Use the --act_bw option to select the bitwidth to use
when quantizing the activations, either 8 (default) or
16.
--weight_bw WEIGHT_BW
Use the --weight_bw option to select the bitwidth to
use when quantizing the weights, currently only 8 bit
(default) supported.
--ignore_encodings Use only quantizer generated encodings, ignoring any
user or model provided encodings. Note: Cannot use
--ignore_encodings with --quantization_overrides
--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
Use per-channel quantization for convolution-based op
weights. Note: This will replace built-in model QAT
encodings when used for a given weight.Usage "--
use_per_channel_quantization" to enable or "--
use_per_channel_quantization false" (default) to
disable

Custom Op Package Options:
--op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
Use this argument to pass an op package library for
quantization. Must be in the form
<op_package_lib_path:interfaceProviderName> and be
separated by a comma for multiple package libs
-p PACKAGE_NAME, --package_name PACKAGE_NAME
A global package name to be used for each node in the
Model.cpp file. Defaults to Qnn header defined package
name
--op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
Path to a Qnn Op Package XML configuration file that
contains user defined custom operations.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

qnn-model-lib-generator

qnn-model-lib-generator工具将QNN模型源代码编译为特定目标的工件。

1
2
3
4
5
6
7
usage: qnn-model-lib-generator 
[-h]
[-c <QNN_MODEL>.cpp]
[-b <QNN_MODEL>.bin]
[-t LIB_TARGETS ]
[-l LIB_NAME]
[-o OUTPUT_DIR]

脚本编译为指定目标提供了 Qnn 模型工件。具体参数如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
Required argument(s):
-c <QNN_MODEL>.cpp
指定converter生成的qnn模型的.cpp文件

optional argument(s):
-b <QNN_MODEL>.bin
指定converter生成的qnn模型的.bin文件,如果.cpp文件从.bin文件中获取需要的选项没有成功,则运行时将失败。
-t LIB_TARGETS
指定要为对应目标平台下生成模型的。Default: aarch64-android x86_64-linux-clang arm-android
-l LIB_NAME
指定要用于库的名称。默认使用对应.bin文件的名称,else 否则的则生成对应qnn_model.so
-o OUTPUT_DIR
保存输出二进制字节模型数据的路径。

qnn-context-binary-generator

qnn-context-binary-generator工具用于通过使用从QNN转换器和特定后端的输出编译的模型库来创建上下文二进制文件。

1
2
3
4
5
6
7
usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
--binary_file BINARY_FILE_NAME
[--model_prefix MODEL_PREFIX]
[--output_dir OUTPUT_DIRECTORY]
[--op_packages ONE_OR_MORE_OP_PACKAGES]
[--config_file CONFIG_FILE.json]
[--verbose] [--version] [--help]

具体参数如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
REQUIRED ARGUMENTS:
-------------------
--model <FILE>
指定包含QNN网络的上一步生成的.SO的文件

--backend <FILE>
指定对应对应的.SO文件的后端库生成对应上下文的二进制字节数据

--binary_file <VAL>
要将上下文二进制保存到的二进制文件的名称。保存在与 --output_dir 选项相同的路径中,.bin为二进制文件扩展名。

OPTIONAL ARGUMENTS:
-------------------
--model_prefix
加载包含 QNN 网络的文件时要使用的函数前缀。默认值:QnnModel。

--output_dir <DIR>
要将输出保存到的目录。默认为 ./output。

--op_packages <VAL> Provide a comma separated list of op packages
and interface providers to register. The syntax is:
op_package_path:interface_provider[,op_package_path:interface_provider...]

--config_file <FILE>
JSON 配置文件的路径。配置文件当前支持与后端扩展和上下文优先级。请参考 SDK 文档了解更多详情。

--enable_intermediate_outputs Enable all intermediate nodes to be output along with
default outputs in the saved context.

--log_level
指定要设置的最大日志记录级别。有效设置: “error”、“warn”、“info”和“verbose”。

--version
打印 QNN 开发工具包版本。

--help
Show this help message.

qnn-net-run

qnn-net-run工具用于运行一个从QNN转换器的输出中编译的模型库,并在一个特定的后端运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
描述:
------------
示范如何使用QNN APIs加载和执行神经网络的示例应用程序。


REQUIRED ARGUMENTS:
-------------------
--model <FILE> 指定模型
--backend <FILE> Path to a QNN backend to execute the model.
--input_list <FILE> 文件目录:列出网络的输入。 如果model.so中有多个图,这必须是以逗号分隔的输入列表文件的清单。
--retrieve_context <VAL> Path to cached binary from which to load a saved
context from and execute graphs. --retrieve_context and
--model are mutually exclusive. Only one of the options
can be specified at a time.


OPTIONAL ARGUMENTS:
-------------------
--model_prefix Function prefix to use when loading <qnn_model_name.so>.
Default: QnnModel

--debug Specifies that output from all layers of the network
will be saved. This option can not be used when loading
a saved context through --retrieve_context option.

--output_dir <DIR> The directory to save output to. Defaults to ./output.

--output_data_type <VAL> Data type of the output. Values can be:

1. float_only: dump outputs in float only.
2. native_only: dump outputs in data type native
to the model. For ex., uint8_t.
3. float_and_native: dump outputs in both float and
native.

(This is N/A for a float model. In other cases,
if not specified, defaults to float_only.)

--input_data_type <VAL> Data type of the input. Values can be:

1. float: reads inputs as floats and quantizes
if necessary based on quantization
parameters in the model.
2. native: reads inputs assuming the data type to be
native to the model. For ex., uint8_t.

(This is N/A for a float model. In other cases,
if not specified, defaults to float.)

--op_packages <VAL> Provide a comma-separated list of op packages, interface
providers, and, optionally, targets to register. Valid values
for target are CPU and HTP. The syntax is:
op_package_path:interface_provider:target[,op_package_path:interface_provider:target...]

--profiling_level <VAL> Enable profiling. Valid Values:
1. basic: captures execution and init time.
2. detailed: in addition to basic, captures per Op timing
for execution, if a backend supports it.

--perf_profile <VAL> Specifies perf profile to set. Valid settings are
low_balanced, balanced, default, high_performance,
sustained_high_performance, burst, low_power_saver,
power_saver, high_power_saver, and system_settings.

--config_file <FILE> Path to a JSON config file. The config file currently
supports options related to backend extensions,
context priority and graph configs. Please refer to SDK
documentation for more details.

--log_level <VAL> Specifies max logging level to be set. Valid settings:
error, warn, info, debug, and verbose.

--shared_buffer Specifies creation of shared buffers for graph I/O between the application
and the device/coprocessor associated with a backend directly.
This option is currently supported on Android only.

--version Print the QNN SDK version.

--help Show this help message.