YOLOv9: A Comprehensive Guide to Setup, Training, and Inference

YOLOv9 represents a significant advancement in real-time object detection, distinguished by its innovative use of a purely convolutional architecture. Unlike many contemporary models that integrate Transformer layers, YOLOv9 achieves state-of-the-art performance, reportedly surpassing models like RT-DETR and even YOLOv8 across various benchmarks. This guide provides a detailed walkthrough for setting up, training, and utilizing YOLOv9 for object detection tasks.

For in-depth technical details, refer to the official paper: https://arxiv.org/abs/2402.13616

The source code is available on GitHub: https://github.com/WongKinYiu/yolov9

YOLOv9 Model Performance on MS COCO Dataset

The table below summarizes the performance metrics for various YOLOv9 models on the MS COCO dataset, evaluated at an image test size of 640 pixels:

Model Test Size APval AP50val AP75val Param. FLOPs
YOLOv9-S 640 46.8% 63.4% 50.7% 7.2M 26.7G
YOLOv9-M 640 51.4% 68.1% 56.1% 20.1M 76.8G
YOLOv9-C 640 53.0% 70.2% 57.8% 25.5M 102.8G
YOLOv9-E 640 55.6% 72.8% 60.6% 58.1M 192.5G

Currently, the official repository primarily offers pre-trained weights for the YOLOv9-C and YOLOv9-E models.

1. Environment Setup

Begin by cloning the YOLOv9 repository and installing the necessary Python dependencies:

git clone https://github.com/WongKinYiu/yolov9.git
cd yolov9
pip install -r requirements.txt

2. Dataset Preparation

The standard COCO 2017 dataset is commonly used for training and evaluation. The repository includes a convenience script to download the dataset components.

bash scripts/get_coco.sh

The get_coco.sh script automates the download of COCO 2017 labels and images. For context, its contents are:

#!/bin/bash
# COCO 2017 dataset http://cocodataset.org
# Download command: bash ./scripts/get_coco.sh

# Download/unzip labels
d='./' # unzip directory
url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
echo 'Downloading' $url$f ' ...'
curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background

# Download/unzip images
d='./coco/images' # unzip directory
url=http://images.cocodataset.org/zips/
f1='train2017.zip' # 19G, 118k images
f2='val2017.zip'   # 1G, 5k images
f3='test2017.zip'  # 7G, 41k images (optional)
for f in $f1 $f2 $f3; do
  echo 'Downloading' $url$f '...'
  curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
done
wait # finish background tasks

Due to the substantial size of the COCO image archives (train2017.zip, val2017.zip, test2017.zip), manual download via web browser might be more reliable than using the script directly. Ensure the coco2017labels-segments.zip is also downloaded, as it contains necessary label files and paths.

After downloading, organize your dataset with a structure similar to this example:

coco
├── annotations/ (from coco2017labels-segments)
│   ├── captions_train2017.json
│   ├── captions_val2017.json
│   ├── instances_train2017.json
│   ├── instances_val2017.json
│   ├── person_keypoints_train2017.json
│   └── person_keypoints_val2017.json
├── labels/
│   ├── train2017/ (contains TXT label files)
│   └── val2017/ (contains TXT label files)
├── train2017/ (contains JPEG images)
│   └── xxx.jpg
├── val2017/ (contains JPEG images)
│   └── xxx.jpg
├── test2017/ (contains JPEG images)
│   └── xxx.jpg
├── train2017.txt
├── val2017.txt
└── test-dev2017.txt

Crucially, update the dataset root path in the yolov9/data/coco.yaml file to point to your COCO dataset directory. For instance, if your dataset is located one level above the project directory in a folder named datasets/coco, modify the first line:

path: ../datasets/coco

3. Model Training

Once the environment and dataset are prepared, you can proceed with training. The official repository provides pre-trained weights for yolov9-c.pt, yolov9-e.pt, gelan-c.pt, and gelan-e.pt. The examples below demonstrate training with yolov9-c.pt, but other models can be used by simply updating the configuration and weights.

Single-GPU Training

To train a YOLOv9 model on a single GPU, use the following command:

python train_dual.py --workers 8 --device 0 --batch 16 --data data/coco.yaml --img 640 --cfg models/detect/yolov9.yaml --weights '' --name yolov9 --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15

The --weights argument specifies the path to pre-trained weights. If left empty (''), training will start from scratch. Replace yolov9.yaml with yolov9-c.yaml or yolov9-e.yaml if using those specific model configurations.

Multi-GPU Training (Distributed)

For distributed training across multiple GPUs, use torch.distributed.launch:

python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_dual.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch 128 --data data/coco.yaml --img 640 --cfg models/detect/yolov9-c.yaml --weights '' --name yolov9-c --hyp hyp.scratch-high.yaml --min-items 0 --epochs 500 --close-mosaic 15

Common Training Issue

Issue: AttributeError: ‘FreeTypeFont’ object has no attribute ‘getsize’
Solution: This typically indicates an incompatibility with the Pillow library. Downgrade Pillow to a compatible version:
pip install Pillow==9.5.0

4. Model Evaluation (Validation)

After training, you can evaluate the model's performance on the validation set using the val_dual.py script:

python val_dual.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.7 --device 0 --weights './yolov9-c.pt' --save-json --name yolov9_c_640_val

This script requires the pycocotools library. If it's not installed, the system might prompt you or you can install it manually:

pip install pycocotools==2.0.7

Common Evaluation Issue

Issue: AttributeError: ‘list’ object has no attribute 'device'.
Solution: This error often occurs in utils/generate.py when the prediction output format is unexpected. A common fix involves modifying line 900 (or near) in utils/generate.py to correctly extract the device from the prediction tensor. Apply the following patch:

  if isinstance(prediction, (list, tuple)):
        processed_predictions = []  # List to store processed tensors
        for pred_tensor in prediction:
            # Process each tensor in the list
            processed_tensor = pred_tensor[0]  # Assuming the first result is relevant
            processed_predictions.append(processed_tensor)  # Add processed tensor to list

        # Use the first processed tensor as the primary prediction result
        prediction = processed_predictions[0]

    device = prediction.device

5. Inference on Images and Videos

To perform inference on new images or video files, use the detect.py script. Adjust the --source parameter to specify the input path.

Video Inference

For detecting objects in a video file:

python detect.py --weights ./ckpt/yolov9-c.pt --conf 0.25 --img-size 640 --source inference/video/demo.mp4 --device 0 --data data/coco.yaml

Image Inference

For detecting objects in a single image or a directory of images:

python detect.py --weights ./ckpt/yolov9-c.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg --device 0

During inference, the system might automatically install additional third-party packages if missing. Common ones include:

pip install gitpython
pip install albumentations

6. Model Deployment

For production deployment, converting YOLOv9 models to formats like ONNX is a common practice. This allows for optimized inference across various platforms and hardware. Users typically need to adapt the export.py script within the repository to convert their trained .pt weights to an ONNX format suitable for their specific deployment environment.

Reference

[1] YOLOv9 Official GitHub Repository: https://github.com/WongKinYiu/yolov9

Tags: YOLOv9 Object Detection Computer Vision Deep Learning pytorch

Posted on Sun, 05 Jul 2026 17:20:05 +0000 by mustng66