YOLOX Object Detection Guide¶
Complete guide to using YOLOX models in AutoTimm, including official YOLOX implementation and YOLOX-style detection with timm backbones.
Table of Contents¶
- Overview
- Two Approaches
- Official YOLOX Models
- YOLOX-Style with timm Backbones
- Model Selection
- Training Settings
- Advanced Usage
- Performance Comparison
Overview¶
YOLOX is a high-performance anchor-free object detector that improves upon YOLO series with: - Decoupled Head: Separate branches for classification and regression - Anchor-Free: Grid-based predictions without anchor boxes - Strong Augmentations: Mosaic, MixUp for better generalization - SimOTA: Advanced label assignment strategy
AutoTimm provides two ways to use YOLOX:
- Official YOLOX (
YOLOXDetector): Complete official implementation with CSPDarknet backbone - YOLOX-Style (
ObjectDetector): YOLOX head with any timm backbone
Two Approaches¶
Comparison¶
| Feature | YOLOXDetector | ObjectDetector (yolox) |
|---|---|---|
| Backbone | CSPDarknet (official) | Any timm model (1000+) |
| Neck | YOLOXPAFPN (official) | FPN (standard) |
| Head | YOLOXHead | YOLOXHead |
| Optimizer | SGD (official settings) | Configurable |
| Scheduler | YOLOX (warmup + cosine) | Configurable |
| Use Case | Reproduce official results | Experimentation |
| Performance | Matches YOLOX paper | Flexible trade-offs |
When to Use Each¶
Use YOLOXDetector when:
- You want to reproduce official YOLOX results
- You need production-ready performance
- You want YOLOX paper benchmarks
- You prefer the optimized YOLOX architecture
Use ObjectDetector (yolox) when:
- You want to experiment with different backbones
- You need transfer learning from pretrained models
- You want to compare different architectures
- You prefer flexibility over official settings
Official YOLOX Models¶
Quick Start¶
from autotimm import YOLOXDetector, DetectionDataModule, AutoTrainer
# Create official YOLOX model
model = YOLOXDetector(
model_name="yolox-s", # nano, tiny, s, m, l, x
num_classes=80,
)
# Data
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=64,
)
# Train
trainer = AutoTrainer(max_epochs=300, precision="16-mixed")
trainer.fit(model, datamodule=data)
Available Models¶
Use list_yolox_models() to see all available models:
from autotimm import list_yolox_models
# Simple list
models = list_yolox_models()
# ['yolox-nano', 'yolox-tiny', 'yolox-s', 'yolox-m', 'yolox-l', 'yolox-x']
# Detailed information
list_yolox_models(verbose=True)
| Model | Params | FLOPs | mAP (COCO) | Use Case |
|---|---|---|---|---|
| yolox-nano | 0.9M | 1.1G | 25.8 | Edge devices, mobile |
| yolox-tiny | 5.1M | 6.5G | 32.8 | Resource-constrained |
| yolox-s | 9.0M | 26.8G | 40.5 | Balanced speed/accuracy |
| yolox-m | 25.3M | 73.8G | 47.2 | Medium performance |
| yolox-l | 54.2M | 155.6G | 50.1 | High accuracy |
| yolox-x | 99.1M | 281.9G | 51.5 | Maximum accuracy |
Official Training Settings¶
YOLOXDetector uses official training configuration by default:
model = YOLOXDetector(
model_name="yolox-s",
num_classes=80,
lr=0.01, # Base LR for batch size 64
weight_decay=5e-4, # Official weight decay
optimizer="sgd", # SGD with momentum=0.9, nesterov=True
scheduler="yolox", # Warmup + cosine decay
total_epochs=300,
warmup_epochs=5, # Linear warmup
no_aug_epochs=15, # No augmentation at end
reg_loss_weight=5.0, # YOLOX uses higher reg weight
)
Learning Rate Scheduler¶
The official YOLOX scheduler has three phases:
- Warmup (5 epochs): Linear warmup from 0 to base_lr
- Main Training (280 epochs): Cosine annealing or linear decay
- No Augmentation (15 epochs): Fixed minimum LR for stability
# Customize scheduler
model = YOLOXDetector(
model_name="yolox-s",
scheduler="yolox",
scheduler_kwargs={
"total_epochs": 300,
"warmup_epochs": 10, # Longer warmup
"no_aug_epochs": 20, # More no-aug epochs
"min_lr_ratio": 0.01, # Lower minimum LR
"scheduler_type": "linear", # Linear instead of cosine
},
)
Model Architecture¶
Official YOLOX components:
from autotimm import list_yolox_backbones, list_yolox_necks, list_yolox_heads
# List components
backbones = list_yolox_backbones() # CSPDarknet variants
necks = list_yolox_necks() # YOLOXPAFPN variants
heads = list_yolox_heads() # YOLOXHead
# Get detailed architecture
from autotimm import get_yolox_model_info
info = get_yolox_model_info("yolox-s")
print(f"Backbone: {info['backbone']}") # csp_darknet_s
print(f"Neck: {info['neck']}") # yolox_pafpn_s
print(f"Head: {info['head']}") # yolox_head
print(f"Channels: {info['backbone_channels']}") # (128, 256, 512)
YOLOX-Style with timm Backbones¶
Quick Start¶
from autotimm import ObjectDetector, DetectionDataModule, AutoTrainer
# YOLOX-style head with any timm backbone
model = ObjectDetector(
backbone="resnet50", # Any timm model
num_classes=80,
detection_arch="yolox", # Use YOLOX head
fpn_channels=256,
head_num_convs=2,
cls_loss_weight=1.0,
reg_loss_weight=5.0,
)
data = DetectionDataModule(data_dir="./coco", image_size=640, batch_size=16)
trainer = AutoTrainer(max_epochs=300)
trainer.fit(model, datamodule=data)
Flexible Backbone Selection¶
Use any of the 1000+ timm models:
# ResNet family
model = ObjectDetector(backbone="resnet50", detection_arch="yolox", ...)
# EfficientNet family
model = ObjectDetector(backbone="efficientnet_b0", detection_arch="yolox", ...)
# ConvNeXt family
model = ObjectDetector(backbone="convnext_tiny", detection_arch="yolox", ...)
# Vision Transformers
model = ObjectDetector(backbone="vit_base_patch16_224", detection_arch="yolox", ...)
# Search available backbones
import autotimm as at # recommended alias
at.list_backbones("*efficientnet*", pretrained_only=True)
Custom Training Settings¶
Full control over optimizer and scheduler:
model = ObjectDetector(
backbone="resnet50",
num_classes=80,
detection_arch="yolox",
lr=1e-3,
weight_decay=1e-4,
optimizer="adamw", # AdamW instead of SGD
scheduler="cosine", # Simple cosine decay
fpn_channels=256,
head_num_convs=2,
focal_alpha=0.25,
focal_gamma=2.0,
cls_loss_weight=1.0,
reg_loss_weight=5.0,
)
Model Selection¶
By Use Case¶
Edge Devices / Mobile:
# YOLOX-Nano: 0.9M params, 1.1G FLOPs
model = YOLOXDetector(model_name="yolox-nano", num_classes=80)
Balanced Performance:
# YOLOX-S: 9.0M params, 26.8G FLOPs, 40.5 mAP
model = YOLOXDetector(model_name="yolox-s", num_classes=80)
High Accuracy:
# YOLOX-L: 54.2M params, 155.6G FLOPs, 50.1 mAP
model = YOLOXDetector(model_name="yolox-l", num_classes=80)
Maximum Accuracy:
# YOLOX-X: 99.1M params, 281.9G FLOPs, 51.5 mAP
model = YOLOXDetector(model_name="yolox-x", num_classes=80)
By Performance Requirements¶
from autotimm import get_yolox_model_info
# Find models with specific requirements
for model_name in ['yolox-nano', 'yolox-tiny', 'yolox-s', 'yolox-m', 'yolox-l', 'yolox-x']:
info = get_yolox_model_info(model_name)
if info['mAP'] > 40 and float(info['params'][:-1]) < 30: # mAP > 40, < 30M params
print(f"Recommended: {model_name} (mAP: {info['mAP']}, Params: {info['params']})")
Training Settings¶
Official YOLOX Settings¶
For reproducing official results:
model = YOLOXDetector(
model_name="yolox-s",
num_classes=80,
lr=0.01, # 0.01 for batch size 64 (scale linearly)
weight_decay=5e-4,
optimizer="sgd",
scheduler="yolox",
total_epochs=300,
warmup_epochs=5,
no_aug_epochs=15,
reg_loss_weight=5.0,
)
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=64, # Official uses 64 (8 GPUs × 8 per GPU)
)
trainer = AutoTrainer(
max_epochs=300,
precision="16-mixed",
accumulate_grad_batches=1,
)
Learning Rate Scaling¶
Scale learning rate based on batch size:
base_lr = 0.01 # For batch size 64
batch_size = 32
lr = base_lr * (batch_size / 64) # 0.005 for batch size 32
model = YOLOXDetector(model_name="yolox-s", lr=lr, ...)
Multi-GPU Training¶
trainer = AutoTrainer(
max_epochs=300,
devices=8, # Use 8 GPUs
strategy="ddp", # Distributed Data Parallel
precision="16-mixed",
)
# Adjust batch size per GPU
data = DetectionDataModule(
data_dir="./coco",
batch_size=8, # Per GPU: 8 GPUs × 8 = 64 total
)
Advanced Usage¶
Custom Metrics¶
from autotimm import YOLOXDetector, MetricConfig
metrics = [
MetricConfig(
name="mAP",
backend="torchmetrics",
metric_class="MeanAveragePrecision",
params={"box_format": "xyxy", "iou_type": "bbox"},
stages=["val", "test"],
prog_bar=True,
),
]
model = YOLOXDetector(model_name="yolox-s", metrics=metrics, ...)
torch.compile (PyTorch 2.0+)¶
Enabled by default for faster training and inference:
# Default: torch.compile enabled
model = YOLOXDetector(
model_name="yolox-s",
num_classes=80,
)
# Disable if needed
model = YOLOXDetector(
model_name="yolox-s",
num_classes=80,
compile_model=False,
)
# Custom compile options
model = YOLOXDetector(
model_name="yolox-s",
num_classes=80,
compile_kwargs={"mode": "reduce-overhead"},
)
What gets compiled: CSPDarknet Backbone + YOLOXPAFPN Neck + YOLOX Head
See ImageClassifier for compile mode details.
Inference¶
import torch
from autotimm import YOLOXDetector
# Load trained model
model = YOLOXDetector.load_from_checkpoint("path/to/checkpoint.ckpt", compile_model=False)
model.eval()
# Prepare image
images = torch.randn(1, 3, 640, 640)
# Run inference
with torch.inference_mode():
predictions = model.predict(images)
# Process results
for pred in predictions:
boxes = pred["boxes"] # [N, 4] in xyxy format
scores = pred["scores"] # [N]
labels = pred["labels"] # [N]
Transfer Learning¶
Fine-tune on custom dataset:
# Start from official YOLOX-S
model = YOLOXDetector(
model_name="yolox-s",
num_classes=10, # Your custom classes
lr=0.001, # Lower LR for fine-tuning
total_epochs=50, # Fewer epochs
)
# Load pretrained weights (when available)
# model = YOLOXDetector.load_from_checkpoint("yolox_s_coco.ckpt")
Performance Comparison¶
Official YOLOX vs YOLOX-Style¶
Tested on COCO val2017:
| Model | Architecture | mAP | FPS (V100) | Notes |
|---|---|---|---|---|
| YOLOX-S (official) | CSPDarknet + PAFPN | 40.5 | 102 | Official settings |
| YOLOX-S (timm R50) | ResNet50 + FPN | ~38.0 | 95 | YOLOX head only |
| YOLOX-M (official) | CSPDarknet + PAFPN | 47.2 | 81 | Official settings |
| YOLOX-L (official) | CSPDarknet + PAFPN | 50.1 | 69 | Official settings |
Speed vs Accuracy Trade-offs¶
YOLOX-Nano: 25.8 mAP, ~2ms inference (fastest)
YOLOX-Tiny: 32.8 mAP, ~4ms inference
YOLOX-S: 40.5 mAP, ~10ms inference (balanced)
YOLOX-M: 47.2 mAP, ~12ms inference
YOLOX-L: 50.1 mAP, ~15ms inference
YOLOX-X: 51.5 mAP, ~20ms inference (most accurate)
Examples¶
See the examples/ directory for complete working examples:
yolox_official.py: Official YOLOX training with all settingsobject_detection_yolox.py: YOLOX-style head with timm backbonesexplore_yolox_models.py: Interactive model explorer
References¶
- YOLOX Quick Reference - Fast reference card
- Object Detector Guide - YOLOX-style with timm backbones
- Official YOLOX: https://github.com/Megvii-BaseDetection/YOLOX
- Paper: https://arxiv.org/abs/2107.08430
- AutoTimm Docs: https://theja-vanka.github.io/AutoTimm/
Troubleshooting¶
For YOLOX detector issues, see the Troubleshooting - YOLOX including:
- CUDA out of memory
- Slow training
- Poor performance
- Proper settings configuration