[TOC]

概述

QNN SDK提供了几个后端库。这些库可以在/target//lib文件夹中找到。QNN后端库名称以libQnn作为前缀。

本节包含与DSP后端API专门化相关的信息。所有QNN DSP后端专门化都可以在/include/DSP/目录下使用。

QNN量化

qnn-onnx-converter

进行ONNX模型转换成QNN模型，并进行量化的过程

# --disable_batchnorm_folding # 该参数为禁止优化 batchnorm 网络结构
echo -e "\n>>>> Step 2 : 执行 qnn-onnx-converter"
${QNN_SDK_BIN}/qnn-onnx-converter \
    --input_network ${MODEL_DIR}/${MODEL_ONNX} \
    --input_list ${INPUT_LIST_QUAN} \
    -o ${MODEL_QNN_DIR}/int8/cpp/${MODEL_NAME}.cpp \
    --disable_batchnorm_folding

脚本说明：

usage: qnn-onnx-converter [--out_node OUT_NAMES]
                          [--input_type INPUT_NAME INPUT_TYPE]
                          [--input_dtype INPUT_NAME INPUT_DTYPE]
                          [--input_encoding INPUT_NAME INPUT_ENCODING]
                          [--input_layout INPUT_NAME INPUT_LAYOUT]
                          [--debug [DEBUG]] [--dry_run [DRY_RUN]]
                          [-d INPUT_NAME INPUT_DIM]
                          [--batch BATCH_DIM]
                          [--define_symbol SYMBOL_NAME VALUE]
                          [--disable_batchnorm_folding]
                          [--quantization_overrides QUANTIZATION_OVERRIDES]
                          [--keep_quant_nodes]
                          [--keep_disconnected_nodes]
                          [--input_list INPUT_LIST]
                          [--param_quantizer PARAM_QUANTIZER]
                          [--act_quantizer ACT_QUANTIZER]
                          [--algorithms ALGORITHMS [ALGORITHMS ...]]
                          [--bias_bw BIAS_BW] [--act_bw ACT_BW]
                          [--weight_bw WEIGHT_BW] [--ignore_encodings]
                          [--use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]]
                          --input_network INPUT_NETWORK 
                          [-h] 
                          [-o OUTPUT_PATH   设置输出模型的路径]   
                          [--copyright_file COPYRIGHT_FILE]
                          [--overwrite_model_prefix] [--exclude_named_tensors]
                          [--op_package_lib OP_PACKAGE_LIB]
                          [-p PACKAGE_NAME | --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]]

Script to convert ONNX model into QNN

required arguments:
  --input_network INPUT_NETWORK, -i INPUT_NETWORK
                        Path to the source framework model.

optional arguments:
  --out_node OUT_NODE   Name of the graph's output nodes. Multiple output
                        nodes should be provided separately like: --out_node
                        out_1 --out_node out_2
  --input_type INPUT_NAME INPUT_TYPE, -t INPUT_NAME INPUT_TYPE
                        Type of data expected by each input op/layer. Type for
                        each input is |default| if not specified. For example:
                        "data" image.Note that the quotes should always be
                        included in order to handle special characters,
                        spaces,etc. For multiple inputs specify multiple
                        --input_type on the command line. Eg: --input_type
                        "data1" image --input_type "data2" opaque These
                        options get used by DSP runtime and following
                        descriptions state how input will be handled for each
                        option. Image: Input is float between 0-255 and the
                        input's mean is 0.0f and the input's max is 255.0f. We
                        will cast the float to uint8ts and pass the uint8ts to
                        the DSP. Default: Pass the input as floats to the dsp
                        directly and the DSP will quantize it. Opaque: Assumes
                        input is float because the consumer layer(i.e next
                        layer) requires it as float, therefore it won't be
                        quantized. Choices supported: image default opaque
   --input_dtype INPUT_NAME INPUT_DTYPE
                        The names and datatype of the network input layers
                        specified in the format [input_name datatype], for
                        example: 'data' 'float32'. Default is float32 if not
                        specified. Note that the quotes should always be
                        included in order to handle special characters, spaces,
                        etc. For multiple inputs specify multiple
                        --input_dtype on the command line like: --input_dtype
                        'data1' 'float32' --input_dtype 'data2' 'float32'
  --input_encoding INPUT_NAME INPUT_ENCODING, -e INPUT_NAME INPUT_ENCODING
                        Image encoding of the source images. Default is bgr.
                        Eg usage: "data" rgba Note the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. For multiple inputs specify
                        --input_encoding for each on the command line. Eg:
                        --input_encoding "data1" rgba --input_encoding "data2"
                        other Use options: color encodings(bgr,rgb, nv21...)
                        if input is image; time_series: for inputs of rnn
                        models; other: if input doesn't follow above
                        categories or is unknown. Choices supported: bgr rgb
                        rgba argb32 nv21 time_series other
  --debug [DEBUG]       Run the converter in debug mode.
  --input_layout INPUT_NAME INPUT_LAYOUT, -l INPUT_NAME INPUT_LAYOUT
                        Layout of each input tensor. If not specified, it will use the default
                        based on the Source Framework, shape of input and input encoding.
                        Accepted values are-
                           NCDHW, NDHWC, NCHW, NHWC, NFC, NCF, NTF, TNF, NF, NC, F, NONTRIVIAL
                        N = Batch, C = Channels, D = Depth, H = Height, W = Width, F = Feature, T = Time
                        NDHWC/NCDHW used for 5d inputs
                        NHWC/NCHW used for 4d image-like inputs
                        NFC/NCF used for inputs to Conv1D or other 1D ops
                        NTF/TNF used for inputs with time steps like the ones used for LSTM op
                        NF used for 2D inputs, like the inputs to Dense/FullyConnected layers
                        NC used for 2D inputs with 1 for batch and other for Channels (rarely used)
                        F used for 1D inputs, e.g. Bias tensor
                        NONTRIVIAL for everything elseFor multiple inputs specify multiple
                        --input_layout on the command line.
                        Eg:
                           --input_layout "data1" NCHW --input_layout "data2" NCHW
  --dry_run [DRY_RUN]   Evaluates the model without actually converting any
                        ops, and returns unsupported ops/attributes as well as
                        unused inputs and/or outputs if any. Leave empty or
                        specify "info" to see dry run as a table, or specify
                        "debug" to show more detailed messages only"
  -d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
                        The name and dimension of all the input buffers to the
                        network specified in the format [input_name comma-
                        separated-dimensions], for example: 'data'
                        1,224,224,3. Note that the quotes should always be
                        included in order to handle special characters,
                        spaces, etc. NOTE: This feature works only with Onnx
                        1.6.0 and above
  -n, --no_simplification
                        Do not attempt to simplify the model automatically.
                        This may prevent some models from properly converting
                        when sequences of unsupported static operations are
                        present.
   -b BATCH, --batch BATCH
                        The batch dimension override. This will take the first dimension of all
                        inputs and treat it as a batch dim, overriding it with the value provided
                        here. For example:
                        --batch 6
                        will result in a shape change from [1,3,224,224] to [6,3,224,224].
                        If there are inputs without batch dim this should not be used and each input
                        should be overridden independently using -d option for input dimension
                        overrides.
   -s SYMBOL_NAME VALUE, --define_symbol SYMBOL_NAME VALUE
                        This option allows overriding specific input dimension symbols. For instance
                        you might see input shapes specified with variables such as :
                        data: [1,3,height,width]
                        To override these simply pass the option as:
                        --define_symbol height 224 --define_symbol width 448
                        which results in dimensions that look like:
                        data: [1,3,224,448]
  --disable_batchnorm_folding
                        If not specified, converter will try to fold batchnorm into previous layer.
  --keep_disconnected_nodes
                        Disable Optimization that removes Ops not connected to the main graph.
                        This optimization uses output names provided over commandline OR
                        inputs/outputs extracted from the Source model to determine the main graph
  -h, --help            show this help message and exit
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path where the converted Output model should be
                        saved.If not specified, the converter model will be
                        written to a file with same name as the input model
  --copyright_file COPYRIGHT_FILE
                        Path to copyright file. If provided, the content of
                        the file will be added to the output model.
  --overwrite_model_prefix
                        If option passed, model generator will use the output
                        path name to use as model prefix to name functions in
                        <qnn_model_name>.cpp. (Useful for running multiple
                        models at once) eg: ModelName_composeGraphs. Default
                        is to use generic "QnnModel_".
  --exclude_named_tensors
                        Remove using source framework tensorNames; instead use
                        a counter for naming tensors. Note: This can
                        potentially help to reduce the final model library
                        that will be generated(Recommended for deploying
                        model). Default is False.

Quantizer Options:
  --quantization_overrides QUANTIZATION_OVERRIDES
                        Use this option to specify a json file with parameters
                        to use for quantization. These will override any
                        quantization data carried from conversion (eg TF fake
                        quantization) or calculated during the normal
                        quantization process. Format defined as per AIMET
                        specification.
  --keep_quant_nodes    Use this option to keep activation quantization nodes
                        in the graph rather than stripping them.
  --input_list INPUT_LIST
                        Path to a file specifying the input data. This file
                        should be a plain text file, containing one or more
                        absolute file paths per line. Each path is expected to
                        point to a binary file containing one input in the
                        "raw" format, ready to be consumed by the quantizer
                        without any further preprocessing. Multiple files per
                        line separated by spaces indicate multiple inputs to
                        the network. See documentation for more details. Must
                        be specified for quantization. All subsequent
                        quantization options are ignored when this is not
                        provided.
  --param_quantizer PARAM_QUANTIZER
                        Optional parameter to indicate the weight/bias
                        quantizer to use. Must be followed by one of the
                        following options: "tf": Uses the real min/max of the
                        data and specified bitwidth (default) "enhanced": Uses
                        an algorithm useful for quantizing models with long
                        tails present in the weight distribution "adjusted":
                        Uses an adjusted min/max for computing the range,
                        particularly good for denoise models "symmetric":
                        Ensures min and max have the same absolute values
                        about zero. Data will be stored as int#_t data such
                        that the offset is always 0.
  --act_quantizer ACT_QUANTIZER
                        Optional parameter to indicate the activation
                        quantizer to use. Must be followed by one of the
                        following options: "tf": Uses the real min/max of the
                        data and specified bitwidth (default) "enhanced": Uses
                        an algorithm useful for quantizing models with long
                        tails present in the weight distribution "adjusted":
                        Uses an adjusted min/max for computing the range,
                        particularly good for denoise models "symmetric":
                        Ensures min and max have the same absolute values
                        about zero. Data will be stored as int#_t data such
                        that the offset is always 0.
  --algorithms ALGORITHMS [ALGORITHMS ...]
                        Use this option to enable new optimization algorithms.
                        Usage is: --algorithms <algo_name1> ... The available
                        optimization algorithms are: "cle" - Cross layer
                        equalization includes a number of methods for
                        equalizing weights and biases across layers in order
                        to rectify imbalances that cause quantization errors.
                        "bc" - Bias correction adjusts biases to offset
                        activation quantization errors. Typically used in
                        conjunction with "cle" to improve quantization
                        accuracy.
  --bias_bw BIAS_BW     Use the --bias_bw option to select the bitwidth to use
                        when quantizing the biases, either 8 (default) or 32.
  --act_bw ACT_BW       Use the --act_bw option to select the bitwidth to use
                        when quantizing the activations, either 8 (default) or
                        16.
  --weight_bw WEIGHT_BW
                        Use the --weight_bw option to select the bitwidth to
                        use when quantizing the weights, currently only 8 bit
                        (default) supported.
  --ignore_encodings    Use only quantizer generated encodings, ignoring any
                        user or model provided encodings. Note: Cannot use
                        --ignore_encodings with --quantization_overrides
  --use_per_channel_quantization [USE_PER_CHANNEL_QUANTIZATION [USE_PER_CHANNEL_QUANTIZATION ...]]
                        Use per-channel quantization for convolution-based op
                        weights. Note: This will replace built-in model QAT
                        encodings when used for a given weight.Usage "--
                        use_per_channel_quantization" to enable or "--
                        use_per_channel_quantization false" (default) to
                        disable

Custom Op Package Options:
  --op_package_lib OP_PACKAGE_LIB, -opl OP_PACKAGE_LIB
                        Use this argument to pass an op package library for
                        quantization. Must be in the form
                        <op_package_lib_path:interfaceProviderName> and be
                        separated by a comma for multiple package libs
  -p PACKAGE_NAME, --package_name PACKAGE_NAME
                        A global package name to be used for each node in the
                        Model.cpp file. Defaults to Qnn header defined package
                        name
  --op_package_config OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...], -opc OP_PACKAGE_CONFIG [OP_PACKAGE_CONFIG ...]
                        Path to a Qnn Op Package XML configuration file that
                        contains user defined custom operations.

Note: Only one of: {'package_name', 'op_package_config'} can be specified

qnn-model-lib-generator

qnn-model-lib-generator 工具将 QNN 模型源代码编译为特定目标生成物。

echo -e "\n>>>> Step 3 : 执行 qnn-model-lib-generator"
${QNN_SDK_BIN}/qnn-model-lib-generator \
     -c ${MODEL_QNN_DIR}/int8/cpp/${MODEL_NAME}.cpp \
     -b ${MODEL_QNN_DIR}/int8/cpp/${MODEL_NAME}.bin \
     -o ${MODEL_QNN_DIR}/int8/so/ \
     -t x86_64-linux-clang

命令行注释：

usage: qnn-model-lib-generator
		[-h]
        [-c <QNN_MODEL>.cpp   QNN模型上一步生成的cpp文件] 
        [-b <QNN_MODEL>.bin	   QNN模型上一步生成的cpp文件]
        [-t LIB_TARGETS 		生成QNN模型的目标类型：x86_64-linux-clang、aarch64-android、x86_64-linux-clang、arm-android] 
        [-l LIB_NAME				指定库的名称] 			
        [-o OUTPUT_DIR         输出SO文件的目录]
        
Required argument(s):
 -c <QNN_MODEL>.cpp                    Filepath for the qnn model .cpp file

optional argument(s):
 -b <QNN_MODEL>.bin                    Filepath for the qnn model .bin file
                                       (Note: if not passed, runtime will fail if .cpp needs any items from a .bin file.)

 -t LIB_TARGETS                        Specifies the targets to build the models for. Default: aarch64-android x86_64-linux-clang arm-android
 -l LIB_NAME                           Specifies the name to use for libraries. Default: uses name in <model.bin> if provided,
                                       else generic qnn_model.so
  -o OUTPUT_DIR                         Location for saving output libraries.

qnn-context-binary-generator

qnn-context-binary-generator 工具用于通过使用从 QNN 转换器的输出和特定后端编译的模型库来创建上下文二进制文件。

echo -e "\n>>>> Step 4 : 执行 qnn-context-binary-generator"
${QNN_SDK_BIN}/qnn-context-binary-generator \
     --backend ${QNN_SDK_BIN}/../lib/libQnnHtp.so \
     --model ${MODEL_QNN_DIR}/int8/so/x86_64-linux-clang/lib${MODEL_NAME}.so \
     --binary_file ${MODEL_NAME}_int8 \
     --output_dir ${MODEL_QNN_DIR}/int8/so/x86_64-linux-clang

命令行说明：

usage: qnn-context-binary-generator --model QNN_MODEL.so --backend QNN_BACKEND.so
                                    --binary_file BINARY_FILE_NAME
                                    [--model_prefix MODEL_PREFIX]
                                    [--output_dir OUTPUT_DIRECTORY]
                                    [--op_packages ONE_OR_MORE_OP_PACKAGES]
                                    [--config_file CONFIG_FILE.json]
                                    [--verbose] [--version] [--help]

REQUIRED ARGUMENTS:
-------------------
  --model             <FILE>      包含 QNN 网络的so文件的路径.

  --backend           <FILE>      Path to a QNN backend .so library to create the context binary.

  --binary_file       <VAL>      用于保存上下文二进制文件的二进制文件的名称。保存在与 --output_dir 选项相同的路径中，并以 .bin 作为二进制文件扩展名。


OPTIONAL ARGUMENTS:
-------------------
  --model_prefix                  Function prefix to use when loading <qnn_model_name.so>file
                                  containing a QNN network. Default: QnnModel.

  --output_dir        <DIR>       The directory to save output to. Defaults to ./output.

  --op_packages       <VAL>       Provide a comma separated list of op packages
                                  and interface providers to register. The syntax is:
                                  op_package_path:interface_provider[,op_package_path:interface_provider...]

  --config_file       <FILE>      Path to a JSON config file. The config file currently
                                  supports options related to backend extensions and
                                  context priority. Please refer to SDK documentation
                                  for more details.

  --enable_intermediate_outputs   Enable all intermediate nodes to be output along with
                                  default outputs in the saved context.

  --log_level                     Specifies max logging level to be set. Valid settings:
                                  "error", "warn", "info" and "verbose".

  --version                       Print the QNN SDK version.

  --help                          Show this help message.