SSD Model Inference Performance Comparison on Rockchip NPU Platforms (RK3568, RK3588, RK1808)

This benchmark evaluates neural processing unit (NPU) performance across Rockchip chipsets using the SSD Inceptoin V2 object detection model with 300×300 RGB input tensors.

Performance Results

Platform Inference Time Throughput (FPS)
RK1808 25 ms/frame 40
RK3588 35–50 ms/frame 20
RK3568 150 ms/frame 6

RK3588 Benchmark Output

$ ./rknn_benchmark ssd_inception_v2.rknn 10 7

rknn_api/rknnrt version: 1.4.0, driver version: 0.7.2
total weight size: 35947840, total internal size: 7473600

model input num: 1, output num: 2
input tensors:
  index=0, name=Preprocessor/sub:0, n_dims=4, dims=[1, 300, 300, 3]
  n_elems=270000, size=270000, fmt=NHWC, type=INT8, qnt_type=AFFINE

output tensors:
  index=0, name=concat:0, dims=[1, 1917, 1, 4], n_elems=7668
  index=1, name=concat_1:0, dims=[1, 1917, 91, 1], n_elems=174447

Warmup phase:
   0: Elapse Time = 32.03ms, FPS = 31.23
   1: Elapse Time = 31.38ms, FPS = 31.86
   2: Elapse Time = 36.38ms, FPS = 27.48
   3: Elapse Time = 43.76ms, FPS = 22.85
   4: Elapse Time = 42.17ms, FPS = 23.71
   5: Elapse Time = 42.99ms, FPS = 23.26
   6: Elapse Time = 44.48ms, FPS = 22.48
   7: Elapse Time = 44.33ms, FPS = 22.56
   8: Elapse Time = 44.96ms, FPS = 22.24
   9: Elapse Time = 47.09ms, FPS = 21.24

Performance measurement:
   0: Elapse Time = 46.54ms, FPS = 21.49
   1: Elapse Time = 47.15ms, FPS = 21.21
   2: Elapse Time = 50.46ms, FPS = 19.82
   3: Elapse Time = 49.54ms, FPS = 20.18
   4: Elapse Time = 49.52ms, FPS = 20.19
   5: Elapse Time = 49.88ms, FPS = 20.05
   6: Elapse Time = 49.91ms, FPS = 20.03
   7: Elapse Time = 47.64ms, FPS = 20.99
   8: Elapse Time = 48.64ms, FPS = 20.56
   9: Elapse Time = 50.26ms, FPS = 19.90

Avg FPS = 20.426

RK3568 Benchmark Output

$ ./rknn_benchmark ssd_inception_v2.rknn 10

rknn_api/rknnrt version: 1.4.0, driver version: 0.7.2
total weight size: 35009792, total internal size: 3873600

model input num: 1, output num: 2
input tensors:
  index=0, name=Preprocessor/sub:0, n_dims=4, dims=[1, 300, 300, 3]
  n_elems=270000, size=270000, fmt=NHWC, type=INT8, qnt_type=AFFINE

output tensors:
  index=0, name=concat:0, dims=[1, 1917, 1, 4], n_elems=7668
  index=1, name=concat_1:0, dims=[1, 1917, 91, 1], n_elems=174447

Warmup phase:
   0: Elapse Time = 157.74ms, FPS = 6.34
   1: Elapse Time = 176.21ms, FPS = 5.67
   2: Elapse Time = 156.27ms, FPS = 6.40
   3: Elapse Time = 143.38ms, FPS = 6.97
   4: Elapse Time = 143.27ms, FPS = 6.98
   5: Elapse Time = 143.13ms, FPS = 6.99
   6: Elapse Time = 145.36ms, FPS = 6.88
   7: Elapse Time = 146.58ms, FPS = 6.82
   8: Elapse Time = 142.40ms, FPS = 7.02
   9: Elapse Time = 145.87ms, FPS = 6.86

Performance measurement:
   0: Elapse Time = 145.39ms, FPS = 6.88
   1: Elapse Time = 146.34ms, FPS = 6.83
   2: Elapse Time = 147.25ms, FPS = 6.79
   3: Elapse Time = 143.72ms, FPS = 6.96
   4: Elapse Time = 144.48ms, FPS = 6.92
   5: Elapse Time = 145.42ms, FPS = 6.88
   6: Elapse Time = 139.06ms, FPS = 7.19
   7: Elapse Time = 173.21ms, FPS = 5.77
   8: Elapse Time = 148.35ms, FPS = 6.74
   9: Elapse Time = 143.97ms, FPS = 6.95

Avg FPS = 6.770

Analysis

The RK1808 demonstrates the highest throughput at 40 FPS, followed by RK3588 at approximately 20 FPS. The RK3568 shows significantly lower performence at 6 FPS. The RK3588 exhibits inference time variance between 35–50 ms per frame during initial runs, stabilizing arround 46–50 ms during sustained execution. The RK3568 maintains relatively consistent latency around 140–150 ms per frame throughout the benchmark.

Tags: NPU Rockchip RK3588 RK3568 RK1808

Posted on Sun, 17 May 2026 00:35:14 +0000 by stressedsue