Models¶

AutoTimm provides specialized model architectures for different computer vision tasks. Choose the model that best fits your use case.

Model Architecture Overview¶

Classification¶

graph LR
    A[<b>ImageClassifier</b>] --> B[Backbone<br/>1000+ timm models]
    B --> C[Global Pooling]
    C --> D[Classification Head<br/>Linear + Dropout]
    D --> E[Softmax / Sigmoid]

    style A fill:#1565C0,stroke:#0D47A1
    style B fill:#1976D2,stroke:#1565C0
    style D fill:#1976D2,stroke:#1565C0
    style E fill:#4CAF50,stroke:#388E3C

Object Detection¶

graph LR
    A[<b>ObjectDetector</b>] --> B[Backbone<br/>1000+ timm models]
    B --> C[FPN]
    C --> D[Detection Head<br/>FCOS / YOLOX]
    D --> E1[Classification]
    D --> E2[Regression]
    D --> E3[Centerness]
    E1 --> F[NMS → Boxes + Classes]
    E2 --> F
    E3 --> F

    style A fill:#1565C0,stroke:#0D47A1
    style B fill:#1976D2,stroke:#1565C0
    style C fill:#1976D2,stroke:#1565C0
    style D fill:#1976D2,stroke:#1565C0
    style F fill:#4CAF50,stroke:#388E3C

graph LR
    A[<b>YOLOXDetector</b>] --> B[CSPDarknet]
    B --> C[PAFPN]
    C --> D[Decoupled Head<br/>+ SimOTA]
    D --> E[NMS → Boxes + Classes]

    style A fill:#1565C0,stroke:#0D47A1
    style B fill:#1976D2,stroke:#1565C0
    style C fill:#1976D2,stroke:#1565C0
    style D fill:#1976D2,stroke:#1565C0
    style E fill:#4CAF50,stroke:#388E3C

Segmentation¶

graph LR
    A[<b>SemanticSegmentor</b>] --> B[Backbone<br/>1000+ timm models]
    B --> C[DeepLabV3+ / FCN<br/>ASPP + Decoder]
    C --> D[Upsample → Argmax]
    D --> E[Pixel Masks]

    style A fill:#1565C0,stroke:#0D47A1
    style B fill:#1976D2,stroke:#1565C0
    style C fill:#1976D2,stroke:#1565C0
    style E fill:#4CAF50,stroke:#388E3C

graph LR
    A[<b>InstanceSegmentor</b>] --> B[Backbone<br/>1000+ timm models]
    B --> C[FPN]
    C --> D[Detection Head]
    C --> E[Mask Head<br/>RoI Align + Conv]
    D --> F[Instance Masks + Boxes]
    E --> F

    style A fill:#1565C0,stroke:#0D47A1
    style B fill:#1976D2,stroke:#1565C0
    style C fill:#1976D2,stroke:#1565C0
    style D fill:#1976D2,stroke:#1565C0
    style E fill:#1976D2,stroke:#1565C0
    style F fill:#4CAF50,stroke:#388E3C

Available Models¶

ImageClassifier ¶

Image classification with any timm backbone and flexible training options.

Key Features:

1000+ pretrained backbones from timm
Transfer learning with backbone freezing
Optimizers: AdamW, SGD, Adam, and more
Schedulers: Cosine, Step, OneCycle
Regularization: Label smoothing, Mixup, Dropout
Two-phase fine-tuning for transformers

Use Cases:

Image categorization (CIFAR-10, ImageNet, custom datasets)
Transfer learning from pretrained models
Fine-tuning vision transformers (ViT, Swin, DeiT)
Multi-class classification tasks

Learn more about ImageClassifier →

ObjectDetector ¶

FCOS-style and YOLOX-style anchor-free object detection with flexible backbones.

Key Features:

Any timm backbone (CNNs and transformers)
FCOS or YOLOX detection architectures (anchor-free)
Feature Pyramid Network (FPN) for multi-scale detection
Focal loss for hard example mining
Configurable inference thresholds and NMS
Support for transformer backbones (Swin, ViT)

Use Cases:

Object detection on COCO or custom datasets
Real-time detection applications
Multi-scale object detection
Experimentation with different backbones

Learn more about ObjectDetector →

YOLOXDetector ¶

Official YOLOX object detection with CSPDarknet backbone and optimized training settings.

Key Features:

Official YOLOX architecture (CSPDarknet + YOLOXPAFPN)
All YOLOX variants: nano, tiny, s, m, l, x
Official training settings (SGD, warmup, cosine decay)
Optimized for COCO dataset
Production-ready performance
Matches official YOLOX paper results

Use Cases:

Reproducing official YOLOX results
Production deployments
High-performance object detection
Edge device deployment (nano, tiny)

Learn more about YOLOXDetector →

Quick Comparison¶

Feature	ImageClassifier	ObjectDetector	YOLOXDetector
Task	Image classification	Object detection	Object detection
Output	Class labels + confidence	Bounding boxes + classes	Bounding boxes + classes
Architecture	Backbone + Classifier head	Backbone + FPN + Detection head	CSPDarknet + PAFPN + YOLOX head
Backbones	1000+ timm models	1000+ timm models	CSPDarknet (official)
Loss	CrossEntropy	Focal + GIoU + Centerness/None	Focal + GIoU
Training speed	Fast	Moderate	Moderate
Inference speed	Very fast	Fast	Fast
Memory usage	Low	Moderate	Moderate
Use case	Classification	Flexible detection	Official YOLOX

Common Concepts¶

Both models share several key concepts:

Backbone Selection¶

Both models support 1000+ backbones from timm:

import autotimm as at  # recommended alias

# List all available backbones
at.list_backbones()

# Search for specific backbones
at.list_backbones("*resnet*")
at.list_backbones("*vit*")

Popular backbone families: ResNet, EfficientNet, ConvNeXt, ViT, Swin, DeiT.

Transfer Learning¶

Both models support transfer learning with backbone freezing:

model = ImageClassifier(
    backbone="resnet50",
    freeze_backbone=True,  # Only train classifier head
    lr=1e-2,
)

Optimizer & Scheduler Configuration¶

Both models share the same optimizer and scheduler configuration:

model = ImageClassifier(
    backbone="resnet50",
    optimizer="adamw",
    lr=1e-3,
    scheduler="cosine",
)

Metrics Integration¶

Both models use the same metrics configuration system:

from autotimm import MetricConfig

metrics = [
    MetricConfig(
        name="accuracy",
        backend="torchmetrics",
        metric_class="Accuracy",
        params={"task": "multiclass"},
        stages=["train", "val", "test"],
    ),
]

Models¶

Model Architecture Overview¶

Classification¶

Object Detection¶

Segmentation¶

Available Models¶

ImageClassifier¶

ObjectDetector¶

YOLOXDetector¶

Quick Comparison¶

Common Concepts¶

Backbone Selection¶

Transfer Learning¶

Optimizer & Scheduler Configuration¶

Metrics Integration¶

See Also¶

ImageClassifier ¶

ObjectDetector ¶

YOLOXDetector ¶