Deploying a Custom YOLOv8s Handwritten Digit Detector on AX650N NPU with Pulsar2 Toolkit

Deploying a custom object detector on edge NPU hardware invovles training, quantization, and compilation. This guide walks through the entire pipeline using a YOLOv8s model trained on handwritten digits.

Training the Custom YOLOv8s Model

Preparing the Custom Dataset

The dataset structure can be organized in any way; the following layout worked well for this example. A typical VOC structure is also acceptable.

digit_detection_data/
├── test/
│   ├── images/   # image files
│   └── labels/   # label files
├── train/
│   ├── images/
│   └── labels/
├── val/
│   ├── images/
│   └── labels/
└── dataset_config.yaml

The dataset_config.yaml file used in this project:

train: ../train/images
val: ../val/images
test: ../test/images
nc: 10
names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

dataset tree example

Setting Up the YOLOv8 Training Environment

Follow the official YOLOv8 environment setup:

# Clone the ultralytics repository
git clone https://github.com/ultralytics/ultralytics

# Navigate to the cloned directory
cd ultralytics

# Install the package in editable mode for development
pip install -e .

Download the pre-trained weights:

wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s.pt

downloaded weights

Verify the environment with a quick test:

yolo predict model=yolov8n.pt source='https://ultralytics.com/images/bus.jpg'

predict bus

Training the Custom YOLOv8s Model

YOLOv8 supports training via CLI or Python scripts. We'll use a Python script for training and CLI for later testing.

cd ultralytics

Create a training script, for example train_digits.py:

from ultralytics import YOLO

model = YOLO('/root/ultralytics/yolov8s.pt')
model.train(data='/root/ultralytics/digit_detection_data/dataset_config.yaml',
            epochs=100,
            amp=False,
            batch=8,
            val=True,
            device=0)

Run the script:

python3 train_digits.py

training progress

Test the trained model on a sample image:

yolo predict model=/root/ultralytics/runs/detect/train17/weights/best.pt source='/root/ultralytics/ultralytics/assets/digit_sample.png' imgsz=640

test prediction

Model Deployment and On‑Board Testing

Preparing the Tools and Files

Exporting an ONNX Model for Pulsar2

Export the best checkpoint to ONNX with opset 11:

yolo task=detect mode=export model=/root/ultralytics/runs/detect/train17/weights/best.pt format=onnx opset=11

Simplify the ONNX graph:

python3 -m onnxsim best.onnx digit_yolov8s_sim.onnx

Simplification report:

┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃            ┃ Original Model ┃ Simplified Model ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ Add        │ 9              │ 8                │
│ Concat     │ 24             │ 19               │
│ Constant   │ 153            │ 139              │
│ Conv       │ 64             │ 64               │
│ Div        │ 2              │ 1                │
│ Gather     │ 4              │ 0                │
│ MaxPool    │ 3              │ 3                │
│ Mul        │ 60             │ 58               │
│ Reshape    │ 5              │ 5                │
│ Resize     │ 2              │ 2                │
│ Shape      │ 4              │ 0                │
│ Sigmoid    │ 58             │ 58               │
│ Slice      │ 2              │ 2                │
│ Softmax    │ 1              │ 1                │
│ Split      │ 9              │ 9                │
│ Sub        │ 2              │ 2                │
│ Transpose  │ 2              │ 2                │
│ Unsqueeze  │ 7              │ 0                │
│ Model Size │ 42.6MiB        │ 42.6MiB          │
└────────────┴────────────────┴──────────────────┘

Extract the subgraph needed for inference (keep only the detection outputs):

import onnx

input_path = "/root/ultralytics/runs/detect/train17/weights/digit_yolov8s_sim.onnx"
output_path = "digit_yolov8s_sub.onnx"
input_names = ["images"]
output_names = ["400","433"]

onnx.utils.extract_model(input_path, output_path, input_names, output_names)

extracted subgraph model

Preparing Quantization Resources

Create a directory structure for Pulsar2:

deploy_data/
├── config/
│   └── digit_config.json
├── dataset/
│   └── calibration_images.tar
├── model/
│   └── digit_yolov8s_sub.onnx
└── pulsar2-run-helper

The configuration file digit_config.json:

{
  "model_type": "ONNX",
  "npu_mode": "NPU1",
  "quant": {
    "input_configs": [
      {
        "tensor_name": "images",
        "calibration_dataset": "./dataset/calibration_images.tar",
        "calibration_size": 4,
        "calibration_mean": [0, 0, 0],
        "calibration_std": [255.0, 255.0, 255.0]
      }
    ],
    "calibration_method": "MinMax",
    "precision_analysis": true,
    "precision_analysis_method":"EndToEnd"
  },
  "input_processors": [
    {
      "tensor_name": "images",
      "tensor_format": "BGR",
      "src_format": "BGR",
      "src_dtype": "U8",
      "src_layout": "NHWC"
    }
  ],
  "output_processors": [
    {
      "tensor_name": "400",
      "dst_perm": [0, 1, 3, 2]
    },
    {
      "tensor_name": "433",
      "dst_perm": [0, 2, 1]
    }
  ],
  "compiler": {
    "check": 0
  }
}

Generating the AXModel

Copy the deploy_data folder into the Pulsar2 Docker environment and execute quantization and compilation:

cd deploy_data/
pulsar2 build --input model/digit_yolov8s_sub.onnx --output_dir output --config config/digit_config.json

Extracts from the build log (trimmed for brevity):

...
Quant Config Table
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Input  ┃ Shape            ┃ Dataset Directory ┃ Data Format ┃ Tensor Format ┃ Mean            ┃ Std                   ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ images │ [1, 3, 640, 640] │ images            │ Image       │ BGR           │ [0.0, 0.0, 0.0] │ [255.0, 255.0, 255.0] │
└────────┴──────────────────┴───────────────────┴─────────────┴───────────────┴─────────────────┴───────────────────────┘
...
Network Quantization Finished.
quant.axmodel export success: output/quant/quant_axmodel.onnx
...
max_cycle = 8,507,216
...

Rename the resulting quantized model for clarity:

mv output/quant/quant_axmodel.onnx digit_yolov8s.axmodel

renamed axmodel

Deploying to the AX650N Board

Using the 1.27 board image with local compilation. Clone the AX samples repository:

git clone https://github.com/AXERA-TECH/ax-samples.git

Inside ax_yolov8s_steps.cc, set the class labels and count for digit recognition:

const char* CLASS_NAMES[] = {
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};

int NUM_CLASS = 10;

Build the project:

cd ax-samples
mkdir build && cd build
cmake -DBSP_MSP_DIR=/soc/ -DAXERA_TARGET_CHIP=ax650 ..
make -j6
make install

The executables are installed under ax-samples/build/install/ax650/. Copy the digit_yolov8s.axmodel and a test image into that directory. Then run inference:

./ax_yolov8 -m digit_yolov8s.axmodel -i sample_digit.jpg

Example output:

--------------------------------------
model file : digit_yolov8s.axmodel
image file : sample_digit.jpg
img_h, img_w : 640 640
--------------------------------------
...
post process cost time:0.49 ms
Repeat 1 times, avg time 10.92 ms, max_time 10.92 ms, min_time 10.92 ms
--------------------------------------
detection num: 4
 2:  94%, [ 275,   38,  362,  168], 2
 3:  94%, [  58,   47,  145,  175], 3
 1:  92%, [  75,  250,  140,  378], 1
 1:  90%, [ 288,  249,  336,  378], 1

inference result on board

Tags: yolov8s custom-training ax650n pulsar2 ONNX

Posted on Wed, 13 May 2026 11:58:01 +0000 by RaythMistwalker