Models¶
AutoTimm provides specialized model architectures for different computer vision tasks. Choose the model that best fits your use case.
Model Architecture Overview¶
Classification¶
graph LR
A[<b>ImageClassifier</b>] --> B[Backbone<br/>1000+ timm models]
B --> C[Global Pooling]
C --> D[Classification Head<br/>Linear + Dropout]
D --> E[Softmax / Sigmoid]
style A fill:#1565C0,stroke:#0D47A1
style B fill:#1976D2,stroke:#1565C0
style D fill:#1976D2,stroke:#1565C0
style E fill:#4CAF50,stroke:#388E3C
Object Detection¶
graph LR
A[<b>ObjectDetector</b>] --> B[Backbone<br/>1000+ timm models]
B --> C[FPN]
C --> D[Detection Head<br/>FCOS / YOLOX]
D --> E1[Classification]
D --> E2[Regression]
D --> E3[Centerness]
E1 --> F[NMS → Boxes + Classes]
E2 --> F
E3 --> F
style A fill:#1565C0,stroke:#0D47A1
style B fill:#1976D2,stroke:#1565C0
style C fill:#1976D2,stroke:#1565C0
style D fill:#1976D2,stroke:#1565C0
style F fill:#4CAF50,stroke:#388E3C
graph LR
A[<b>YOLOXDetector</b>] --> B[CSPDarknet]
B --> C[PAFPN]
C --> D[Decoupled Head<br/>+ SimOTA]
D --> E[NMS → Boxes + Classes]
style A fill:#1565C0,stroke:#0D47A1
style B fill:#1976D2,stroke:#1565C0
style C fill:#1976D2,stroke:#1565C0
style D fill:#1976D2,stroke:#1565C0
style E fill:#4CAF50,stroke:#388E3C
Segmentation¶
graph LR
A[<b>SemanticSegmentor</b>] --> B[Backbone<br/>1000+ timm models]
B --> C[DeepLabV3+ / FCN<br/>ASPP + Decoder]
C --> D[Upsample → Argmax]
D --> E[Pixel Masks]
style A fill:#1565C0,stroke:#0D47A1
style B fill:#1976D2,stroke:#1565C0
style C fill:#1976D2,stroke:#1565C0
style E fill:#4CAF50,stroke:#388E3C
graph LR
A[<b>InstanceSegmentor</b>] --> B[Backbone<br/>1000+ timm models]
B --> C[FPN]
C --> D[Detection Head]
C --> E[Mask Head<br/>RoI Align + Conv]
D --> F[Instance Masks + Boxes]
E --> F
style A fill:#1565C0,stroke:#0D47A1
style B fill:#1976D2,stroke:#1565C0
style C fill:#1976D2,stroke:#1565C0
style D fill:#1976D2,stroke:#1565C0
style E fill:#1976D2,stroke:#1565C0
style F fill:#4CAF50,stroke:#388E3C
Available Models¶
ImageClassifier¶
Image classification with any timm backbone and flexible training options.
Key Features:
- 1000+ pretrained backbones from timm
- Transfer learning with backbone freezing
- Optimizers: AdamW, SGD, Adam, and more
- Schedulers: Cosine, Step, OneCycle
- Regularization: Label smoothing, Mixup, Dropout
- Two-phase fine-tuning for transformers
Use Cases:
- Image categorization (CIFAR-10, ImageNet, custom datasets)
- Transfer learning from pretrained models
- Fine-tuning vision transformers (ViT, Swin, DeiT)
- Multi-class classification tasks
Learn more about ImageClassifier →
ObjectDetector¶
FCOS-style and YOLOX-style anchor-free object detection with flexible backbones.
Key Features:
- Any timm backbone (CNNs and transformers)
- FCOS or YOLOX detection architectures (anchor-free)
- Feature Pyramid Network (FPN) for multi-scale detection
- Focal loss for hard example mining
- Configurable inference thresholds and NMS
- Support for transformer backbones (Swin, ViT)
Use Cases:
- Object detection on COCO or custom datasets
- Real-time detection applications
- Multi-scale object detection
- Experimentation with different backbones
Learn more about ObjectDetector →
YOLOXDetector¶
Official YOLOX object detection with CSPDarknet backbone and optimized training settings.
Key Features:
- Official YOLOX architecture (CSPDarknet + YOLOXPAFPN)
- All YOLOX variants: nano, tiny, s, m, l, x
- Official training settings (SGD, warmup, cosine decay)
- Optimized for COCO dataset
- Production-ready performance
- Matches official YOLOX paper results
Use Cases:
- Reproducing official YOLOX results
- Production deployments
- High-performance object detection
- Edge device deployment (nano, tiny)
Learn more about YOLOXDetector →
Quick Comparison¶
| Feature | ImageClassifier | ObjectDetector | YOLOXDetector |
|---|---|---|---|
| Task | Image classification | Object detection | Object detection |
| Output | Class labels + confidence | Bounding boxes + classes | Bounding boxes + classes |
| Architecture | Backbone + Classifier head | Backbone + FPN + Detection head | CSPDarknet + PAFPN + YOLOX head |
| Backbones | 1000+ timm models | 1000+ timm models | CSPDarknet (official) |
| Loss | CrossEntropy | Focal + GIoU + Centerness/None | Focal + GIoU |
| Training speed | Fast | Moderate | Moderate |
| Inference speed | Very fast | Fast | Fast |
| Memory usage | Low | Moderate | Moderate |
| Use case | Classification | Flexible detection | Official YOLOX |
Common Concepts¶
Both models share several key concepts:
Backbone Selection¶
Both models support 1000+ backbones from timm:
import autotimm as at # recommended alias
# List all available backbones
at.list_backbones()
# Search for specific backbones
at.list_backbones("*resnet*")
at.list_backbones("*vit*")
Popular backbone families: ResNet, EfficientNet, ConvNeXt, ViT, Swin, DeiT.
Transfer Learning¶
Both models support transfer learning with backbone freezing:
model = ImageClassifier(
backbone="resnet50",
freeze_backbone=True, # Only train classifier head
lr=1e-2,
)
Optimizer & Scheduler Configuration¶
Both models share the same optimizer and scheduler configuration:
Metrics Integration¶
Both models use the same metrics configuration system:
from autotimm import MetricConfig
metrics = [
MetricConfig(
name="accuracy",
backend="torchmetrics",
metric_class="Accuracy",
params={"task": "multiclass"},
stages=["train", "val", "test"],
),
]
See Also¶
- Image Classification Data - Data loading for classification
- Object Detection Data - Data loading for detection
- Classification Inference - Making predictions with classifiers
- Object Detection Inference - Making predictions with detectors
- Training Guide - Training models with AutoTrainer
- Metrics - Configuring metrics and logging