Object Detection Data¶
The DetectionDataModule handles object detection datasets in COCO format and CSV format, including support for custom annotations, class filtering, and detection-specific augmentations.
Detection Data Pipeline¶
graph TD
A[Data Source] --> B{Format}
B -->|COCO| C1[Annotations JSON<br/>+ Images]
B -->|CSV| C2[CSV with bbox columns<br/>+ Images]
C1 --> D[DetectionDataModule]
C2 --> D
D --> E{Preprocessing}
E --> F1[Filter Classes]
E --> F2[Min Bbox Area]
E --> F3[Min Visibility]
F1 --> G{Augmentation}
F2 --> G
F3 --> G
G -->|Train| H1[Flip + Crop + Color<br/>+ BBox Transform]
G -->|Val/Test| H2[Resize + Normalize]
H1 --> I[Collate → Batch<br/>Images + Boxes + Labels]
H2 --> I
style A fill:#1565C0,stroke:#0D47A1
style B fill:#FF9800,stroke:#F57C00
style C1 fill:#1976D2,stroke:#1565C0
style C2 fill:#1976D2,stroke:#1565C0
style D fill:#1565C0,stroke:#0D47A1
style E fill:#FF9800,stroke:#F57C00
style G fill:#FF9800,stroke:#F57C00
style I fill:#4CAF50,stroke:#388E3C
COCO Dataset Format¶
Expected directory structure:
coco/
train2017/ # Training images
000000000001.jpg
000000000002.jpg
...
val2017/ # Validation images
000000000001.jpg
...
test2017/ # Test images (optional)
000000000001.jpg
...
annotations/
instances_train2017.json
instances_val2017.json
instances_test2017.json # Optional
The annotation files follow the standard COCO JSON format with image metadata, categories, and bounding box annotations.
Basic Usage¶
import autotimm as at # recommended alias
from autotimm import DetectionDataModule
data = DetectionDataModule(
data_dir="./coco",
image_size=640, # Standard COCO size
batch_size=16,
num_workers=4,
)
Augmentation Presets¶
DetectionDataModule supports albumentations-based augmentation presets designed for object detection:
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=16,
augmentation_preset="default", # or "strong"
)
Available presets:
| Preset | Description |
|---|---|
default |
RandomResizedCrop (80-100%), HorizontalFlip |
strong |
Affine, blur, noise, brightness/contrast adjustments |
Preset Details:
Default Preset¶
- Random resized crop (scale 0.8-1.0)
- Horizontal flip (50%)
- Standard normalization
Strong Preset¶
- All default transforms
- Affine transformations (rotation, scale, shear)
- Gaussian blur and noise
- Brightness/contrast adjustments
- Color jittering
- CoarseDropout for regularization
Custom Transforms¶
Use custom albumentations pipelines with bbox-aware transforms:
import albumentations as A
custom_train = A.Compose(
[
A.RandomResizedCrop(height=640, width=640, scale=(0.8, 1.0)),
A.HorizontalFlip(p=0.5),
A.RandomBrightnessContrast(p=0.5),
A.GaussianBlur(blur_limit=(3, 7), p=0.3),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
],
bbox_params=A.BboxParams(format="coco", label_fields=["labels"]),
)
custom_eval = A.Compose(
[
A.Resize(height=640, width=640),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
],
bbox_params=A.BboxParams(format="coco", label_fields=["labels"]),
)
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=16,
train_transforms=custom_train,
eval_transforms=custom_eval,
)
Important: Always include bbox_params when using custom transforms to ensure bounding boxes are properly transformed along with the images.
Recommended Detection Transforms¶
| Transform | Use Case | Parameters |
|---|---|---|
RandomResizedCrop |
Scale variation | scale=(0.8, 1.0) |
HorizontalFlip |
Mirror augmentation | p=0.5 |
ShiftScaleRotate |
Geometric variation | shift_limit=0.1, scale_limit=0.1, rotate_limit=15 |
RandomBrightnessContrast |
Lighting variation | brightness_limit=0.2, contrast_limit=0.2 |
GaussianBlur |
Robustness to blur | blur_limit=(3, 7), p=0.3 |
HueSaturationValue |
Color variation | hue_shift_limit=20, sat_shift_limit=30 |
CoarseDropout |
Regularization | max_holes=8, max_height=32, max_width=32 |
Avoid: Transforms that can crop out bounding boxes entirely or make them too small.
Filter by Class¶
Train on a subset of classes:
# COCO class IDs (1-indexed):
# 1: person, 2: bicycle, 3: car, etc.
selected_classes = [1, 2, 3] # person, bicycle, car
data = DetectionDataModule(
data_dir="./coco",
image_size=512,
batch_size=8,
class_ids=selected_classes,
)
This:
- Filters the dataset to only include images containing these classes
- Removes annotations for other classes
- Remaps class IDs to 0-based indexing (e.g., [1, 2, 3] → [0, 1, 2])
- Updates num_classes automatically
Use Cases:
- Training specialized detectors (e.g., only vehicles)
- Reducing dataset size for faster experimentation
- Domain-specific applications
Filter Small Boxes¶
Remove very small bounding boxes that are difficult to detect:
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=16,
min_bbox_area=32.0, # Minimum box area in pixels
)
This filters out bounding boxes smaller than the specified area, which helps: - Avoid training on boxes too small to detect reliably - Reduce noise in training data - Improve model focus on detectable objects
Guidelines:
min_bbox_area=32.0: Standard for most applicationsmin_bbox_area=64.0: For large object detectionmin_bbox_area=16.0: When small objects are critical
DataLoader Options¶
Fine-tune data loading performance:
data = DetectionDataModule(
data_dir="./coco",
batch_size=16,
num_workers=8, # Parallel data loading
pin_memory=True, # Faster GPU transfer
persistent_workers=True, # Keep workers alive
prefetch_factor=4, # Batches to prefetch per worker
)
Performance Tips:
- num_workers: Start with
4, increase if CPU is underutilized - batch_size: Decrease if you hit OOM errors (detection uses more memory than classification)
- pin_memory: Always
Truewhen using GPU - persistent_workers:
Truereduces worker restart overhead - prefetch_factor: Increase to
4-8if CPU can keep up
Full Parameter Reference¶
DetectionDataModule(
data_dir="./coco", # COCO dataset root directory
image_size=640, # Target image size (square)
batch_size=16, # Batch size
num_workers=4, # Data loading workers
train_transforms=None, # Custom albumentations transforms for training
eval_transforms=None, # Custom albumentations transforms for eval
augmentation_preset=None, # "default" or "strong"
class_ids=None, # List of class IDs to filter (None = all)
min_bbox_area=0.0, # Minimum bounding box area in pixels
pin_memory=True, # Pin memory for GPU transfer
persistent_workers=False, # Keep workers alive between epochs
prefetch_factor=None, # Batches to prefetch per worker
)
Complete Example¶
from autotimm import (
AutoTrainer,
DetectionDataModule,
LoggerConfig,
MetricConfig,
ObjectDetector,
)
# Data
data = DetectionDataModule(
data_dir="./coco",
image_size=640,
batch_size=16,
num_workers=4,
augmentation_preset="default",
min_bbox_area=32.0,
)
# Metrics
metric_configs = [
MetricConfig(
name="mAP",
backend="torchmetrics",
metric_class="MeanAveragePrecision",
params={"box_format": "xyxy", "iou_type": "bbox"},
stages=["val", "test"],
prog_bar=True,
),
]
# Model
model = ObjectDetector(
backbone="resnet50",
num_classes=80,
metrics=metric_configs,
fpn_channels=256,
head_num_convs=4,
lr=1e-4,
scheduler="multistep",
scheduler_kwargs={"milestones": [8, 11], "gamma": 0.1},
)
# Trainer
trainer = AutoTrainer(
max_epochs=12,
logger=[LoggerConfig(backend="tensorboard", params={"save_dir": "logs"})],
checkpoint_monitor="val/map",
checkpoint_mode="max",
gradient_clip_val=1.0,
)
# Train
trainer.fit(model, datamodule=data)
trainer.test(model, datamodule=data)
Custom COCO Datasets¶
To use your own COCO-format dataset:
- Organize your data following the COCO structure
- Create annotation files in COCO JSON format
- Ensure class IDs are consistent and 1-indexed (COCO standard)
- Verify images are in the expected directories
Minimal COCO Annotation Structure¶
{
"images": [
{
"id": 1,
"file_name": "000001.jpg",
"width": 640,
"height": 480
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"bbox": [x, y, width, height],
"area": width * height,
"iscrowd": 0
}
],
"categories": [
{
"id": 1,
"name": "class_name"
}
]
}
See Also¶
- CSV Data Loading - Load detection data from CSV files
- Image Classification Data - For classification datasets
- Object Detection Examples - More examples and use cases
- Training Guide - How to train detection models