Deploying RKNN Models: Evaluation and Inference Testing

Differences Between Loading Native and RKNN-Converted Models

Models developed in frameworks like PyTorch, TensorFlow, or ONNX must be converted into the proprietary RKNN format to leverage Rockchip’s NPU aceleration. The RKNN format is optimized for Rockchip’s neural processing units, enabling efficient execution on embedded platforms such as the RK3588, RK3566, or RV1126.

Model Format Native models (e.g., .pt, .onnx) are incompatible with the RKNN runtime. Only after conversion to the .rknn binary format can they be executed directly on Rockchip hardware. Supported source formats include:

Caffe TensorFlow ONNX PyTorch MxNet MindSpore

Precision Modes During conversion, two precision modes are available:

Float32: Preserves full numerical precision, yielding higher accuracy but consuming more memory and computational resources. Quantized (INT8): Reduces model size and power consumption by converting weights and activations to 8-bit integers, at the cost of potential accuracy loss.

Performance Optimizations The RKNN toolkit automatically applies optimizations during conversion:

Operator fusion to reduce kernel launches Memory reuse and allocation optimization Parallel execution scheduling across NPU cores

Deployment Scope Converted RKNN models can run on:

Rockchip NPU (primary target) GPU or CPU (fallback, with reduced performance)

Crucially, only converted RKNN models can be deployed on actual hardware. Models loaded via load_rknn() cannot be executed in the RKNN simulator — they require physical device connectivity.

Preparing the Environment Ensure the rknn-toolkits installed on your host system. The accompanying code package can be obtained by scanning the QR code in the original article and replying with "RKNN评估与推理" on the associated WeChat public account.

Inferring from a Native Model (PyTorch → RKNN)

The full pipeline involves conversion, export, and runtime initialization:

from rknn.api import RKNN
import cv2
import numpy as np

def show_top5(output):
    sorted_indices = np.argsort(output)[::-1]
    top5_str = '\n----------top5-----------\n'
    for i in range(5):
        idx = sorted_indices[i]
        prob = output[idx]
        top5_str += f"{idx}:{prob:.6f}\n"
    print(top5_str)

def softmax(x):
    exp_x = np.exp(x - np.max(x))  # Numerical stability
    return exp_x / np.sum(exp_x)

if __name__ == '__main__':
    rknn = RKNN(verbose=True)

    # Configuration for preprocessing and target platform
    rknn.config(
        mean_values=[[123.675, 116.28, 103.53]],
        std_values=[[58.395, 58.395, 58.395]],
        target_platform='rk3588'
    )

    # Load PyTorch model
    rknn.load_pytorch(
        model='./resnet18.pt',
        input_size_list=[[1, 3, 224, 224]]
    )

    # Build and quantize the model
    rknn.build(
        do_quantization=True,
        dataset='dataset.txt',
        rknn_batch_size=-1
    )

    # Export the RKNN model
    rknn.export_rknn('resnet18.rknn')

    # Initialize runtime for NPU execution
    rknn.init_runtime(
        target='rk3588',
        perf_debug=False,
        eval_mem=False,
        core_mask=RKNN.NPU_CORE_AUTO
    )

    # Load and preprocess input image
    img_bgr = cv2.imread('space_shuttle_224.jpg')
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

    # Run inference
    outputs = rknn.inference(inputs=[img_rgb], data_format='nhwc')

    # Post-process and display top-5 predictions
    prob = softmax(np.array(outputs[0][0]))
    show_top5(prob)

    rknn.release()

Inferring Directly from an RKNN Model

Once the model is exported as resnet18.rknn, you can skip conversion and load it directly:

from rknn.api import RKNN
import cv2
import numpy as np

def show_top5(output):
    sorted_indices = np.argsort(output)[::-1]
    top5_str = '\n----------top5-----------\n'
    for i in range(5):
        idx = sorted_indices[i]
        prob = output[idx]
        top5_str += f"{idx}:{prob:.6f}\n"
    print(top5_str)

def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / np.sum(exp_x)

if __name__ == '__main__':
    rknn = RKNN(verbose=True)

    # Directly load the pre-converted RKNN model
    rknn.load_rknn('resnet18.rknn')

    # Initialize runtime (same as above)
    rknn.init_runtime(
        target='rk3588',
        perf_debug=False,
        eval_mem=False,
        core_mask=RKNN.NPU_CORE_AUTO
    )

    # Load and preprocess input
    img_bgr = cv2.imread('space_shuttle_224.jpg')
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)

    # Execute inference
    outputs = rknn.inference(inputs=[img_rgb], data_format='nhwc')

    # Output top-5 predictions
    prob = softmax(np.array(outputs[0][0]))
    show_top5(prob)

    rknn.release()

Key Observations

Both workflows require identical runtime initialization and inference calls. The critical difference lies in whether you use load_pytorch() (conversion on-the-fly) or load_rknn() (direct deployment). Direct RKNN loading is faster for repeated inference tasks since conversion is already complete. Always verify the output class index against the ImageNet label file — in this case, index 812 corresponds to "space shuttle".

Tags: RKNN Rockchip NPU pytorch model quantization

Posted on Mon, 15 Jun 2026 18:27:05 +0000 by dwest