Segmentation Data Loading¶
This guide covers data loading for semantic and instance segmentation tasks. For CSV-based data loading, see CSV Data Loading.
Segmentation Data Pipeline¶
graph TD
A[Dataset] --> B{Format}
B -->|PNG| C1[image/ + mask/<br/>1:1 pairs]
B -->|Cityscapes| C2[leftImg8bit/ + gtFine/<br/>19 classes]
B -->|Pascal VOC| C3[JPEGImages/ +<br/>SegmentationClass/]
B -->|COCO Stuff| C4[images/ +<br/>annotations/]
B -->|CSV| C5[image_path,<br/>mask_path]
C1 --> D[SegmentationDataModule]
C2 --> D
C3 --> D
C4 --> D
C5 --> D
D --> E{Augmentation}
E -->|Train| F1[Spatial + Color<br/>applied to both<br/>image and mask]
E -->|Val/Test| F2[Resize + Normalize]
F1 --> G[Collate → Batch<br/>Images + Masks]
F2 --> G
style A fill:#1565C0,stroke:#0D47A1
style B fill:#FF9800,stroke:#F57C00
style C1 fill:#1976D2,stroke:#1565C0
style C2 fill:#1976D2,stroke:#1565C0
style C3 fill:#1976D2,stroke:#1565C0
style C4 fill:#1976D2,stroke:#1565C0
style C5 fill:#1976D2,stroke:#1565C0
style D fill:#1565C0,stroke:#0D47A1
style E fill:#FF9800,stroke:#F57C00
style G fill:#4CAF50,stroke:#388E3C
Semantic Segmentation Data¶
Supported Formats¶
AutoTimm supports multiple segmentation dataset formats:
| Format | Description | Use Case |
|---|---|---|
| PNG | Simple image + mask pairs | Custom datasets, quick prototyping |
| Cityscapes | Urban scene segmentation | Autonomous driving |
| Pascal VOC | 20-class segmentation | General object segmentation |
| COCO Stuff | 171 stuff categories | Scene understanding |
PNG Format¶
The simplest format for custom datasets.
Directory Structure:
data/
train/
images/
img001.jpg
img002.jpg
img003.png
masks/
img001.png
img002.png
img003.png
val/
images/
masks/
test/
images/
masks/
Requirements:
- Images and masks must have matching filenames
- Masks must be single-channel PNG with pixel values = class indices
- Use 255 for unlabeled/ignored pixels
Example:
import autotimm as at # recommended alias
from autotimm import SegmentationDataModule
data = SegmentationDataModule(
data_dir="./data",
format="png",
image_size=512,
batch_size=8,
)
Cityscapes Format¶
Urban scene segmentation with 19 classes.
Directory Structure:
cityscapes/
leftImg8bit/
train/
aachen/
aachen_000000_000019_leftImg8bit.png
...
val/
gtFine/
train/
aachen/
aachen_000000_000019_gtFine_labelIds.png
...
val/
Example:
data = SegmentationDataModule(
data_dir="./cityscapes",
format="cityscapes",
image_size=512,
batch_size=8,
)
Class Mapping:
Cityscapes has 34 original classes mapped to 19 training classes. AutoTimm handles this automatically.
Pascal VOC Format¶
20 object classes + background.
Directory Structure:
VOC2012/
JPEGImages/
2007_000027.jpg
2007_000032.jpg
SegmentationClass/
2007_000027.png
2007_000032.png
ImageSets/
Segmentation/
train.txt
val.txt
Example:
COCO Stuff Format¶
171 stuff categories for scene understanding.
Directory Structure:
Example:
Instance Segmentation Data¶
COCO Instance Format¶
AutoTimm uses COCO JSON format with segmentation annotations.
Directory Structure:
coco/
train2017/
000000000001.jpg
000000000002.jpg
val2017/
annotations/
instances_train2017.json
instances_val2017.json
Example:
from autotimm import InstanceSegmentationDataModule
data = InstanceSegmentationDataModule(
data_dir="./coco",
image_size=640,
batch_size=4,
)
Annotation Format:
The JSON contains:
- images: List of image info (id, file_name, width, height)
- annotations: List of instance annotations
- bbox: [x, y, width, height] in COCO format
- segmentation: RLE or polygon mask
- category_id: Object class
- area: Mask area
- iscrowd: Whether instance is crowd
Mask Formats:
COCO supports two mask representations:
1. Polygon: [[x1, y1, x2, y2, ...]] - List of polygon vertices
2. RLE: Compressed binary mask (for crowd annotations)
AutoTimm automatically decodes both using pycocotools.
Data Augmentation¶
Semantic Segmentation¶
Presets¶
# Light augmentation
data = SegmentationDataModule(
data_dir="./data",
format="png",
augmentation_preset="light",
)
# Default augmentation
data = SegmentationDataModule(
data_dir="./data",
augmentation_preset="default",
)
# Strong augmentation
data = SegmentationDataModule(
data_dir="./data",
augmentation_preset="strong",
)
Preset Details:
| Transform | Light | Default | Strong |
|---|---|---|---|
| RandomScale | ±10% | ±20% | ±50% |
| RandomCrop | Yes | Yes | Yes |
| HorizontalFlip | 0.5 | 0.5 | 0.5 |
| ColorJitter | Mild | Medium | Strong |
| GaussianBlur | No | No | Yes |
| Rotate | No | ±10° | ±15° |
Custom Transforms¶
import albumentations as A
from albumentations.pytorch import ToTensorV2
train_transforms = A.Compose([
A.RandomScale(scale_limit=0.5),
A.RandomCrop(height=512, width=512),
A.HorizontalFlip(p=0.5),
A.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),
A.GaussianBlur(p=0.3),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
])
val_transforms = A.Compose([
A.Resize(512, 512),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
])
data = SegmentationDataModule(
data_dir="./data",
format="png",
custom_train_transforms=train_transforms,
custom_val_transforms=val_transforms,
)
Important Notes:
- Masks are automatically transformed with nearest-neighbor interpolation
- No need to specify additional_targets for masks
- Albumentations handles spatial consistency automatically
Instance Segmentation¶
import albumentations as A
from albumentations.pytorch import ToTensorV2
transforms = A.Compose([
A.RandomScale(scale_limit=0.1),
A.HorizontalFlip(p=0.5),
A.ColorJitter(brightness=0.2, contrast=0.2),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2(),
], bbox_params=A.BboxParams(format='coco', label_fields=['labels']))
data = InstanceSegmentationDataModule(
data_dir="./coco",
custom_train_transforms=transforms,
)
Important Notes:
- Must specify
bbox_paramsfor detection + segmentation - Masks are automatically transformed alongside boxes
- Format='coco' means [x, y, width, height]
Class Mapping¶
Handle Sparse Class IDs¶
Some datasets have non-contiguous class IDs. Use class_mapping to convert to contiguous indices:
# Dataset has classes: 0 (bg), 7 (car), 11 (person), 255 (ignore)
# Map to: 0 (bg), 1 (car), 2 (person), 255 (ignore)
class_mapping = {
0: 0, # background
7: 1, # car
11: 2, # person
255: 255 # ignore
}
data = SegmentationDataModule(
data_dir="./data",
format="png",
class_mapping=class_mapping,
ignore_index=255,
)
DataModule Parameters¶
SegmentationDataModule¶
data = SegmentationDataModule(
data_dir="./data", # Root directory
format="png", # Dataset format
image_size=512, # Target size (square)
batch_size=8, # Batch size
num_workers=4, # Dataloader workers
augmentation_preset="default", # Augmentation preset
custom_train_transforms=None, # Custom train transforms
custom_val_transforms=None, # Custom val transforms
class_mapping=None, # Class ID mapping
ignore_index=255, # Ignore label
)
InstanceSegmentationDataModule¶
data = InstanceSegmentationDataModule(
data_dir="./coco", # Root directory
image_size=640, # Target size (square)
batch_size=4, # Batch size
num_workers=4, # Dataloader workers
augmentation_preset="default", # Augmentation preset
custom_train_transforms=None, # Custom train transforms
custom_val_transforms=None, # Custom val transforms
)
Creating Custom Datasets¶
For Semantic Segmentation¶
Create a custom dataset by organizing images and masks:
from pathlib import Path
from PIL import Image
import numpy as np
def create_png_dataset(image_dir, output_dir, num_classes):
"""Convert custom data to PNG format."""
output_dir = Path(output_dir)
(output_dir / "train" / "images").mkdir(parents=True, exist_ok=True)
(output_dir / "train" / "masks").mkdir(parents=True, exist_ok=True)
for image_path in Path(image_dir).glob("*.jpg"):
# Copy image
image = Image.open(image_path)
image.save(output_dir / "train" / "images" / image_path.name)
# Create/convert mask
# Your custom logic to generate mask
mask = np.zeros((image.height, image.width), dtype=np.uint8)
# ... populate mask with class indices (0 to num_classes-1)
# Use 255 for ignored pixels
mask_image = Image.fromarray(mask)
mask_name = image_path.stem + ".png"
mask_image.save(output_dir / "train" / "masks" / mask_name)
For Instance Segmentation¶
Convert to COCO format:
import json
from pycocotools import mask as mask_utils
def create_coco_json(images_dir, annotations, output_path):
"""Create COCO JSON from custom annotations."""
coco_format = {
"images": [],
"annotations": [],
"categories": []
}
# Add categories
for cat_id, cat_name in enumerate(["person", "car", "dog"], start=1):
coco_format["categories"].append({
"id": cat_id,
"name": cat_name,
"supercategory": "object"
})
# Add images and annotations
ann_id = 1
for img_id, (image_path, anns) in enumerate(annotations.items(), start=1):
image = Image.open(image_path)
coco_format["images"].append({
"id": img_id,
"file_name": Path(image_path).name,
"width": image.width,
"height": image.height,
})
# Add annotations for this image
for ann in anns:
# Convert binary mask to RLE
rle = mask_utils.encode(np.asfortranarray(ann['mask']))
rle['counts'] = rle['counts'].decode('utf-8')
coco_format["annotations"].append({
"id": ann_id,
"image_id": img_id,
"category_id": ann['category_id'],
"bbox": ann['bbox'], # [x, y, w, h]
"area": float(mask_utils.area(rle)),
"segmentation": rle,
"iscrowd": 0,
})
ann_id += 1
# Save JSON
with open(output_path, 'w') as f:
json.dump(coco_format, f)
Best Practices¶
1. Image Size¶
Choose based on your task: - 512x512: Good balance for most tasks - 768x768: Higher quality, more memory - 1024x1024: Best quality, requires lots of memory
2. Batch Size¶
Segmentation requires more memory than classification: - Start with batch_size=4 and increase if possible - Use gradient accumulation for larger effective batch sizes
3. Workers¶
Set num_workers based on CPU cores:
4. Augmentation¶
- Use strong augmentation for small datasets
- Use light augmentation for large datasets
- Always normalize with ImageNet stats for pretrained backbones
5. Ignore Index¶
Always set ignore_index=255 for unlabeled pixels:
Troubleshooting¶
For common segmentation data loading issues, see the Troubleshooting - Data Loading, which includes:
- Masks not loading (format, pixel values, filename issues)
- Slow data loading (workers, image size, storage)
- Transform errors (albumentations, backend mixing)
- Out of memory errors
Examples¶
See: - Semantic Segmentation Example - Instance Segmentation Example - CSV Data Loading - Load segmentation data from CSV files