Skip to content

CSV Data Loading API

Complete API reference for CSV-based data loading across all tasks.

Overview

AutoTimm provides CSV dataset classes for loading data from CSV files with custom annotations. Each task has a dedicated CSV dataset class:

Dataset Task Description
CSVImageDataset Classification Single-label classification from CSV
MultiLabelImageDataset Multi-Label Multi-label classification from CSV
CSVDetectionDataset Detection Object detection with bounding boxes
CSVInstanceDataset Instance Seg Instance segmentation with masks

Note: For DataModules that wrap these datasets, see: - ImageDataModule - supports train_csv, val_csv, test_csv for classification - MultiLabelImageDataModule - CSV-only data module for multi-label - DetectionDataModule - supports CSV via format parameter - InstanceSegmentationDataModule - supports CSV format


CSVImageDataset

Dataset for single-label classification from CSV files.

API Reference

autotimm.CSVImageDataset

Bases: Dataset

Dataset for single-label classification from CSV.

CSV format::

image_path,label
img001.jpg,cat
img002.jpg,dog

Parameters:

Name Type Description Default
csv_path str | Path

Path to CSV file.

required
image_dir str | Path

Root directory for resolving image paths.

'.'
image_column str | None

Name of the column containing image paths. If None, uses the first column.

None
label_column str | None

Name of the column containing class labels. If None, uses the second column.

None
transform Callable | None

Transform to apply to images. Supports both torchvision transforms (PIL input) and albumentations transforms (numpy input with image key).

None
use_albumentations bool

If True, load images with OpenCV and pass as numpy arrays to an albumentations transform. Default is False (PIL + torchvision).

False
Source code in src/autotimm/data/dataset.py
class CSVImageDataset(Dataset):
    """Dataset for single-label classification from CSV.

    CSV format::

        image_path,label
        img001.jpg,cat
        img002.jpg,dog

    Parameters:
        csv_path: Path to CSV file.
        image_dir: Root directory for resolving image paths.
        image_column: Name of the column containing image paths.
            If ``None``, uses the first column.
        label_column: Name of the column containing class labels.
            If ``None``, uses the second column.
        transform: Transform to apply to images. Supports both
            torchvision transforms (PIL input) and albumentations
            transforms (numpy input with ``image`` key).
        use_albumentations: If ``True``, load images with OpenCV and
            pass as numpy arrays to an albumentations transform.
            Default is ``False`` (PIL + torchvision).
    """

    def __init__(
        self,
        csv_path: str | Path,
        image_dir: str | Path = ".",
        image_column: str | None = None,
        label_column: str | None = None,
        transform: Callable | None = None,
        use_albumentations: bool = False,
    ):
        self.csv_path = Path(csv_path)
        self.image_dir = Path(image_dir)
        self.transform = transform
        self.use_albumentations = use_albumentations

        # Parse CSV
        with open(self.csv_path, newline="") as f:
            reader = csv.DictReader(f)
            fieldnames = list(reader.fieldnames or [])
            if not fieldnames:
                raise ValueError(f"CSV file has no columns: {self.csv_path}")

            # Determine image column
            self._image_column = image_column or fieldnames[0]
            if self._image_column not in fieldnames:
                raise ValueError(
                    f"Image column '{self._image_column}' not found in CSV. "
                    f"Available columns: {fieldnames}"
                )

            # Determine label column
            if label_column is not None:
                if label_column not in fieldnames:
                    raise ValueError(
                        f"Label column '{label_column}' not found in CSV. "
                        f"Available columns: {fieldnames}"
                    )
                self._label_column = label_column
            else:
                non_image = [c for c in fieldnames if c != self._image_column]
                if not non_image:
                    raise ValueError(
                        "No label column found. Provide label_column explicitly "
                        "or ensure the CSV has a column beyond the image column."
                    )
                self._label_column = non_image[0]

            # Read rows
            self._image_paths: list[str] = []
            self._labels_raw: list[str] = []
            for row in reader:
                self._image_paths.append(row[self._image_column])
                self._labels_raw.append(row[self._label_column])

        if len(self._image_paths) == 0:
            raise ValueError(f"CSV file has no data rows: {self.csv_path}")

        # Build class mapping from unique label values
        unique_labels = sorted(set(self._labels_raw))
        self.classes: list[str] = unique_labels
        self.class_to_idx: dict[str, int] = {
            cls: idx for idx, cls in enumerate(self.classes)
        }

        # Convert string labels to indices
        self.samples: list[tuple[str, int]] = [
            (img, self.class_to_idx[lbl])
            for img, lbl in zip(self._image_paths, self._labels_raw)
        ]

    @property
    def num_classes(self) -> int:
        return len(self.classes)

    def __len__(self) -> int:
        return len(self.samples)

    def __getitem__(self, index: int) -> tuple[Any, int]:
        rel_path, target = self.samples[index]
        img_path = self.image_dir / rel_path

        if self.use_albumentations:
            import cv2

            image = cv2.imread(str(img_path), cv2.IMREAD_COLOR)
            if image is None:
                raise RuntimeError(f"Failed to load image: {img_path}")
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

            if self.transform is not None:
                transformed = self.transform(image=image)
                image = transformed["image"]
        else:
            from PIL import Image

            image = Image.open(img_path).convert("RGB")
            if self.transform is not None:
                image = self.transform(image)

        return image, target

num_classes property

num_classes: int

classes instance-attribute

classes: list[str] = unique_labels

class_to_idx instance-attribute

class_to_idx: dict[str, int] = {cls: idx for idx, cls in (enumerate(classes))}

__init__

__init__(csv_path: str | Path, image_dir: str | Path = '.', image_column: str | None = None, label_column: str | None = None, transform: Callable | None = None, use_albumentations: bool = False)
Source code in src/autotimm/data/dataset.py
def __init__(
    self,
    csv_path: str | Path,
    image_dir: str | Path = ".",
    image_column: str | None = None,
    label_column: str | None = None,
    transform: Callable | None = None,
    use_albumentations: bool = False,
):
    self.csv_path = Path(csv_path)
    self.image_dir = Path(image_dir)
    self.transform = transform
    self.use_albumentations = use_albumentations

    # Parse CSV
    with open(self.csv_path, newline="") as f:
        reader = csv.DictReader(f)
        fieldnames = list(reader.fieldnames or [])
        if not fieldnames:
            raise ValueError(f"CSV file has no columns: {self.csv_path}")

        # Determine image column
        self._image_column = image_column or fieldnames[0]
        if self._image_column not in fieldnames:
            raise ValueError(
                f"Image column '{self._image_column}' not found in CSV. "
                f"Available columns: {fieldnames}"
            )

        # Determine label column
        if label_column is not None:
            if label_column not in fieldnames:
                raise ValueError(
                    f"Label column '{label_column}' not found in CSV. "
                    f"Available columns: {fieldnames}"
                )
            self._label_column = label_column
        else:
            non_image = [c for c in fieldnames if c != self._image_column]
            if not non_image:
                raise ValueError(
                    "No label column found. Provide label_column explicitly "
                    "or ensure the CSV has a column beyond the image column."
                )
            self._label_column = non_image[0]

        # Read rows
        self._image_paths: list[str] = []
        self._labels_raw: list[str] = []
        for row in reader:
            self._image_paths.append(row[self._image_column])
            self._labels_raw.append(row[self._label_column])

    if len(self._image_paths) == 0:
        raise ValueError(f"CSV file has no data rows: {self.csv_path}")

    # Build class mapping from unique label values
    unique_labels = sorted(set(self._labels_raw))
    self.classes: list[str] = unique_labels
    self.class_to_idx: dict[str, int] = {
        cls: idx for idx, cls in enumerate(self.classes)
    }

    # Convert string labels to indices
    self.samples: list[tuple[str, int]] = [
        (img, self.class_to_idx[lbl])
        for img, lbl in zip(self._image_paths, self._labels_raw)
    ]

__len__

__len__() -> int
Source code in src/autotimm/data/dataset.py
def __len__(self) -> int:
    return len(self.samples)

__getitem__

__getitem__(index: int) -> tuple[Any, int]
Source code in src/autotimm/data/dataset.py
def __getitem__(self, index: int) -> tuple[Any, int]:
    rel_path, target = self.samples[index]
    img_path = self.image_dir / rel_path

    if self.use_albumentations:
        import cv2

        image = cv2.imread(str(img_path), cv2.IMREAD_COLOR)
        if image is None:
            raise RuntimeError(f"Failed to load image: {img_path}")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        if self.transform is not None:
            transformed = self.transform(image=image)
            image = transformed["image"]
    else:
        from PIL import Image

        image = Image.open(img_path).convert("RGB")
        if self.transform is not None:
            image = self.transform(image)

    return image, target

CSV Format

image_path,label
img001.jpg,cat
img002.jpg,dog
img003.jpg,cat
  • Column 1 (default): relative image path
  • Column 2 (default): class label (string)

Custom column names can be specified via image_column and label_column parameters.

Usage Examples

Basic Usage

from autotimm.data import CSVImageDataset
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

dataset = CSVImageDataset(
    csv_path="train.csv",
    image_dir="./images",
    transform=transform,
)

print(f"Classes: {dataset.classes}")
print(f"Num classes: {dataset.num_classes}")

With Albumentations

import albumentations as A
from albumentations.pytorch import ToTensorV2
from autotimm.data import CSVImageDataset

transform = A.Compose([
    A.Resize(224, 224),
    A.HorizontalFlip(p=0.5),
    A.Normalize(),
    ToTensorV2(),
])

dataset = CSVImageDataset(
    csv_path="train.csv",
    image_dir="./images",
    transform=transform,
    use_albumentations=True,  # Load images with OpenCV
)

Custom Column Names

# CSV with custom headers:
# filepath,category,metadata
# data/img1.jpg,cat,outdoor
# data/img2.jpg,dog,indoor

dataset = CSVImageDataset(
    csv_path="custom.csv",
    image_dir="./",
    image_column="filepath",
    label_column="category",
)

Parameters

Parameter Type Default Description
csv_path str \| Path Required Path to CSV file
image_dir str \| Path "." Root directory for resolving image paths
image_column str \| None None Name of image path column (first column if None)
label_column str \| None None Name of label column (second column if None)
transform Callable \| None None Image transforms
use_albumentations bool False Load images with OpenCV for albumentations

Attributes

Attribute Type Description
classes list[str] Sorted list of unique class names
class_to_idx dict[str, int] Mapping from class name to index
num_classes int Number of classes
samples list[tuple[str, int]] List of (image_path, class_idx) tuples

MultiLabelImageDataset

Dataset for multi-label classification from CSV files with binary label columns.

API Reference

autotimm.MultiLabelImageDataset

Bases: Dataset

Dataset for multi-label classification from CSV.

CSV format::

image_path,label_0,label_1,...,label_N
img1.jpg,1,0,1,...,0
img2.jpg,0,1,0,...,1

Parameters:

Name Type Description Default
csv_path str | Path

Path to CSV file.

required
image_dir str | Path

Root directory for resolving image paths.

'.'
label_columns list[str] | None

List of column names to use as labels. If None, uses all columns except the image column.

None
image_column str | None

Name of the column containing image paths. If None, uses the first column.

None
transform Callable | None

Transform to apply to images. Supports both torchvision transforms (PIL input) and albumentations transforms (numpy input with image key).

None
use_albumentations bool

If True, load images with OpenCV and pass as numpy arrays to an albumentations transform. Default is False (PIL + torchvision).

False
Source code in src/autotimm/data/dataset.py
class MultiLabelImageDataset(Dataset):
    """Dataset for multi-label classification from CSV.

    CSV format::

        image_path,label_0,label_1,...,label_N
        img1.jpg,1,0,1,...,0
        img2.jpg,0,1,0,...,1

    Parameters:
        csv_path: Path to CSV file.
        image_dir: Root directory for resolving image paths.
        label_columns: List of column names to use as labels.
            If ``None``, uses all columns except the image column.
        image_column: Name of the column containing image paths.
            If ``None``, uses the first column.
        transform: Transform to apply to images. Supports both
            torchvision transforms (PIL input) and albumentations
            transforms (numpy input with ``image`` key).
        use_albumentations: If ``True``, load images with OpenCV and
            pass as numpy arrays to an albumentations transform.
            Default is ``False`` (PIL + torchvision).
    """

    def __init__(
        self,
        csv_path: str | Path,
        image_dir: str | Path = ".",
        label_columns: list[str] | None = None,
        image_column: str | None = None,
        transform: Callable | None = None,
        use_albumentations: bool = False,
    ):
        self.csv_path = Path(csv_path)
        self.image_dir = Path(image_dir)
        self.transform = transform
        self.use_albumentations = use_albumentations

        # Parse CSV
        with open(self.csv_path, newline="") as f:
            reader = csv.DictReader(f)
            fieldnames = list(reader.fieldnames or [])
            if not fieldnames:
                raise ValueError(f"CSV file has no columns: {self.csv_path}")

            # Determine image column
            self._image_column = image_column or fieldnames[0]
            if self._image_column not in fieldnames:
                raise ValueError(
                    f"Image column '{self._image_column}' not found in CSV. "
                    f"Available columns: {fieldnames}"
                )

            # Determine label columns
            if label_columns is not None:
                for col in label_columns:
                    if col not in fieldnames:
                        raise ValueError(
                            f"Label column '{col}' not found in CSV. "
                            f"Available columns: {fieldnames}"
                        )
                self._label_columns = label_columns
            else:
                self._label_columns = [c for c in fieldnames if c != self._image_column]

            if not self._label_columns:
                raise ValueError(
                    "No label columns found. Provide label_columns explicitly "
                    "or ensure the CSV has columns beyond the image column."
                )

            # Read rows
            self._image_paths: list[str] = []
            self._labels: list[list[float]] = []
            for row in reader:
                self._image_paths.append(row[self._image_column])
                self._labels.append([float(row[col]) for col in self._label_columns])

        if len(self._image_paths) == 0:
            raise ValueError(f"CSV file has no data rows: {self.csv_path}")

    @property
    def num_labels(self) -> int:
        """Number of label columns."""
        return len(self._label_columns)

    @property
    def label_names(self) -> list[str]:
        """Names of the label columns."""
        return list(self._label_columns)

    def __len__(self) -> int:
        return len(self._image_paths)

    def __getitem__(self, index: int) -> tuple[Any, torch.Tensor]:
        img_path = self.image_dir / self._image_paths[index]
        label_tensor = torch.tensor(self._labels[index], dtype=torch.float32)

        if self.use_albumentations:
            import cv2

            image = cv2.imread(str(img_path), cv2.IMREAD_COLOR)
            if image is None:
                raise RuntimeError(f"Failed to load image: {img_path}")
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

            if self.transform is not None:
                transformed = self.transform(image=image)
                image = transformed["image"]
        else:
            from PIL import Image

            image = Image.open(img_path).convert("RGB")
            if self.transform is not None:
                image = self.transform(image)

        return image, label_tensor

num_labels property

num_labels: int

Number of label columns.

label_names property

label_names: list[str]

Names of the label columns.

__init__

__init__(csv_path: str | Path, image_dir: str | Path = '.', label_columns: list[str] | None = None, image_column: str | None = None, transform: Callable | None = None, use_albumentations: bool = False)
Source code in src/autotimm/data/dataset.py
def __init__(
    self,
    csv_path: str | Path,
    image_dir: str | Path = ".",
    label_columns: list[str] | None = None,
    image_column: str | None = None,
    transform: Callable | None = None,
    use_albumentations: bool = False,
):
    self.csv_path = Path(csv_path)
    self.image_dir = Path(image_dir)
    self.transform = transform
    self.use_albumentations = use_albumentations

    # Parse CSV
    with open(self.csv_path, newline="") as f:
        reader = csv.DictReader(f)
        fieldnames = list(reader.fieldnames or [])
        if not fieldnames:
            raise ValueError(f"CSV file has no columns: {self.csv_path}")

        # Determine image column
        self._image_column = image_column or fieldnames[0]
        if self._image_column not in fieldnames:
            raise ValueError(
                f"Image column '{self._image_column}' not found in CSV. "
                f"Available columns: {fieldnames}"
            )

        # Determine label columns
        if label_columns is not None:
            for col in label_columns:
                if col not in fieldnames:
                    raise ValueError(
                        f"Label column '{col}' not found in CSV. "
                        f"Available columns: {fieldnames}"
                    )
            self._label_columns = label_columns
        else:
            self._label_columns = [c for c in fieldnames if c != self._image_column]

        if not self._label_columns:
            raise ValueError(
                "No label columns found. Provide label_columns explicitly "
                "or ensure the CSV has columns beyond the image column."
            )

        # Read rows
        self._image_paths: list[str] = []
        self._labels: list[list[float]] = []
        for row in reader:
            self._image_paths.append(row[self._image_column])
            self._labels.append([float(row[col]) for col in self._label_columns])

    if len(self._image_paths) == 0:
        raise ValueError(f"CSV file has no data rows: {self.csv_path}")

__len__

__len__() -> int
Source code in src/autotimm/data/dataset.py
def __len__(self) -> int:
    return len(self._image_paths)

__getitem__

__getitem__(index: int) -> tuple[Any, torch.Tensor]
Source code in src/autotimm/data/dataset.py
def __getitem__(self, index: int) -> tuple[Any, torch.Tensor]:
    img_path = self.image_dir / self._image_paths[index]
    label_tensor = torch.tensor(self._labels[index], dtype=torch.float32)

    if self.use_albumentations:
        import cv2

        image = cv2.imread(str(img_path), cv2.IMREAD_COLOR)
        if image is None:
            raise RuntimeError(f"Failed to load image: {img_path}")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        if self.transform is not None:
            transformed = self.transform(image=image)
            image = transformed["image"]
    else:
        from PIL import Image

        image = Image.open(img_path).convert("RGB")
        if self.transform is not None:
            image = self.transform(image)

    return image, label_tensor

CSV Format

image_path,cat,dog,outdoor,indoor
img1.jpg,1,0,1,0
img2.jpg,0,1,0,1
img3.jpg,1,1,1,0
  • Column 1 (default): relative image path
  • Remaining columns: binary label indicators (0 or 1)

Usage Examples

Basic Usage

from autotimm.data import MultiLabelImageDataset
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

dataset = MultiLabelImageDataset(
    csv_path="train.csv",
    image_dir="./images",
    transform=transform,
)

print(f"Labels: {dataset.label_names}")
print(f"Num labels: {dataset.num_labels}")

With Explicit Label Columns

dataset = MultiLabelImageDataset(
    csv_path="train.csv",
    image_dir="./images",
    label_columns=["cat", "dog", "bird"],  # Only use these labels
    image_column="filepath",
)

With Albumentations

import albumentations as A
from albumentations.pytorch import ToTensorV2

transform = A.Compose([
    A.Resize(224, 224),
    A.HorizontalFlip(p=0.5),
    A.Normalize(),
    ToTensorV2(),
])

dataset = MultiLabelImageDataset(
    csv_path="train.csv",
    image_dir="./images",
    transform=transform,
    use_albumentations=True,
)

Parameters

Parameter Type Default Description
csv_path str \| Path Required Path to CSV file
image_dir str \| Path "." Root directory for resolving image paths
label_columns list[str] \| None None List of label column names (auto-detected if None)
image_column str \| None None Name of image path column (first column if None)
transform Callable \| None None Image transforms
use_albumentations bool False Load images with OpenCV for albumentations

Attributes

Attribute Type Description
label_names list[str] List of label column names
num_labels int Number of labels
samples list[tuple[str, list[int]]] List of (image_path, labels) tuples

CSVDetectionDataset

Dataset for object detection from CSV files with bounding box annotations.

API Reference

autotimm.CSVDetectionDataset

Bases: Dataset

Dataset for object detection from CSV.

CSV format (one row per bounding box)::

image_path,x_min,y_min,x_max,y_max,label
img001.jpg,10,20,100,200,cat
img001.jpg,50,60,150,250,dog
img002.jpg,30,40,120,220,cat

Multiple rows per image are grouped automatically.

Parameters:

Name Type Description Default
csv_path str | Path

Path to CSV file.

required
image_dir str | Path

Directory containing image files.

'.'
image_column str

Column name for image paths. Default "image_path".

'image_path'
bbox_columns list[str] | None

Column names for bounding box coordinates. Default ["x_min", "y_min", "x_max", "y_max"].

None
label_column str

Column name for class labels. Default "label".

'label'
transform Callable | None

Albumentations transform with bbox_params.

None
min_bbox_area float

Minimum bbox area to include. Default 0.

0.0

Attributes:

Name Type Description
class_names list[str]

List of class names.

num_classes int

Number of classes.

Source code in src/autotimm/data/detection_dataset.py
class CSVDetectionDataset(Dataset):
    """Dataset for object detection from CSV.

    CSV format (one row per bounding box)::

        image_path,x_min,y_min,x_max,y_max,label
        img001.jpg,10,20,100,200,cat
        img001.jpg,50,60,150,250,dog
        img002.jpg,30,40,120,220,cat

    Multiple rows per image are grouped automatically.

    Parameters:
        csv_path: Path to CSV file.
        image_dir: Directory containing image files.
        image_column: Column name for image paths. Default ``"image_path"``.
        bbox_columns: Column names for bounding box coordinates.
            Default ``["x_min", "y_min", "x_max", "y_max"]``.
        label_column: Column name for class labels. Default ``"label"``.
        transform: Albumentations transform with bbox_params.
        min_bbox_area: Minimum bbox area to include. Default 0.

    Attributes:
        class_names: List of class names.
        num_classes: Number of classes.
    """

    def __init__(
        self,
        csv_path: str | Path,
        image_dir: str | Path = ".",
        image_column: str = "image_path",
        bbox_columns: list[str] | None = None,
        label_column: str = "label",
        transform: Callable | None = None,
        min_bbox_area: float = 0.0,
    ):
        self.csv_path = Path(csv_path)
        self.image_dir = Path(image_dir)
        self.transform = transform
        self.min_bbox_area = min_bbox_area

        if bbox_columns is None:
            bbox_columns = ["x_min", "y_min", "x_max", "y_max"]

        # Parse CSV and group by image
        image_anns: dict[str, list[dict]] = defaultdict(list)
        with open(self.csv_path, newline="") as f:
            reader = csv.DictReader(f)
            fieldnames = list(reader.fieldnames or [])

            # Validate columns exist
            for col in [image_column, label_column] + bbox_columns:
                if col not in fieldnames:
                    raise ValueError(
                        f"Column '{col}' not found in CSV. "
                        f"Available columns: {fieldnames}"
                    )

            for row in reader:
                img_path = row[image_column]
                x1 = float(row[bbox_columns[0]])
                y1 = float(row[bbox_columns[1]])
                x2 = float(row[bbox_columns[2]])
                y2 = float(row[bbox_columns[3]])

                # Filter by area
                area = (x2 - x1) * (y2 - y1)
                if area < min_bbox_area:
                    continue

                image_anns[img_path].append(
                    {"bbox": [x1, y1, x2, y2], "label": row[label_column]}
                )

        if not image_anns:
            raise ValueError(
                f"No images with valid annotations found in {self.csv_path}."
            )

        # Build class mapping
        all_labels = sorted(
            {ann["label"] for anns in image_anns.values() for ann in anns}
        )
        self.class_names: list[str] = all_labels
        self._class_to_idx: dict[str, int] = {
            name: idx for idx, name in enumerate(all_labels)
        }
        self.num_classes: int = len(all_labels)

        # Store as ordered list for indexing
        self._image_paths: list[str] = sorted(image_anns.keys())
        self._image_anns = image_anns

    def __len__(self) -> int:
        return len(self._image_paths)

    def __getitem__(self, index: int) -> dict[str, Any]:
        import cv2

        img_rel = self._image_paths[index]
        img_path = self.image_dir / img_rel
        anns = self._image_anns[img_rel]

        image = cv2.imread(str(img_path))
        if image is None:
            raise RuntimeError(f"Failed to load image: {img_path}")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

        orig_h, orig_w = image.shape[:2]

        # Boxes are already in xyxy format
        bboxes = [ann["bbox"] for ann in anns]
        labels = [self._class_to_idx[ann["label"]] for ann in anns]

        if self.transform is not None:
            transformed = self.transform(image=image, bboxes=bboxes, labels=labels)
            image = transformed["image"]
            bboxes = transformed["bboxes"]
            labels = transformed["labels"]

        if len(bboxes) > 0:
            boxes = torch.tensor(bboxes, dtype=torch.float32)
            labels = torch.tensor(labels, dtype=torch.int64)
        else:
            boxes = torch.zeros((0, 4), dtype=torch.float32)
            labels = torch.zeros((0,), dtype=torch.int64)

        return {
            "image": image,
            "boxes": boxes,
            "labels": labels,
            "image_id": index,
            "orig_size": torch.tensor([orig_h, orig_w]),
        }

num_classes instance-attribute

num_classes: int = len(all_labels)

__init__

__init__(csv_path: str | Path, image_dir: str | Path = '.', image_column: str = 'image_path', bbox_columns: list[str] | None = None, label_column: str = 'label', transform: Callable | None = None, min_bbox_area: float = 0.0)
Source code in src/autotimm/data/detection_dataset.py
def __init__(
    self,
    csv_path: str | Path,
    image_dir: str | Path = ".",
    image_column: str = "image_path",
    bbox_columns: list[str] | None = None,
    label_column: str = "label",
    transform: Callable | None = None,
    min_bbox_area: float = 0.0,
):
    self.csv_path = Path(csv_path)
    self.image_dir = Path(image_dir)
    self.transform = transform
    self.min_bbox_area = min_bbox_area

    if bbox_columns is None:
        bbox_columns = ["x_min", "y_min", "x_max", "y_max"]

    # Parse CSV and group by image
    image_anns: dict[str, list[dict]] = defaultdict(list)
    with open(self.csv_path, newline="") as f:
        reader = csv.DictReader(f)
        fieldnames = list(reader.fieldnames or [])

        # Validate columns exist
        for col in [image_column, label_column] + bbox_columns:
            if col not in fieldnames:
                raise ValueError(
                    f"Column '{col}' not found in CSV. "
                    f"Available columns: {fieldnames}"
                )

        for row in reader:
            img_path = row[image_column]
            x1 = float(row[bbox_columns[0]])
            y1 = float(row[bbox_columns[1]])
            x2 = float(row[bbox_columns[2]])
            y2 = float(row[bbox_columns[3]])

            # Filter by area
            area = (x2 - x1) * (y2 - y1)
            if area < min_bbox_area:
                continue

            image_anns[img_path].append(
                {"bbox": [x1, y1, x2, y2], "label": row[label_column]}
            )

    if not image_anns:
        raise ValueError(
            f"No images with valid annotations found in {self.csv_path}."
        )

    # Build class mapping
    all_labels = sorted(
        {ann["label"] for anns in image_anns.values() for ann in anns}
    )
    self.class_names: list[str] = all_labels
    self._class_to_idx: dict[str, int] = {
        name: idx for idx, name in enumerate(all_labels)
    }
    self.num_classes: int = len(all_labels)

    # Store as ordered list for indexing
    self._image_paths: list[str] = sorted(image_anns.keys())
    self._image_anns = image_anns

__len__

__len__() -> int
Source code in src/autotimm/data/detection_dataset.py
def __len__(self) -> int:
    return len(self._image_paths)

__getitem__

__getitem__(index: int) -> dict[str, Any]
Source code in src/autotimm/data/detection_dataset.py
def __getitem__(self, index: int) -> dict[str, Any]:
    import cv2

    img_rel = self._image_paths[index]
    img_path = self.image_dir / img_rel
    anns = self._image_anns[img_rel]

    image = cv2.imread(str(img_path))
    if image is None:
        raise RuntimeError(f"Failed to load image: {img_path}")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    orig_h, orig_w = image.shape[:2]

    # Boxes are already in xyxy format
    bboxes = [ann["bbox"] for ann in anns]
    labels = [self._class_to_idx[ann["label"]] for ann in anns]

    if self.transform is not None:
        transformed = self.transform(image=image, bboxes=bboxes, labels=labels)
        image = transformed["image"]
        bboxes = transformed["bboxes"]
        labels = transformed["labels"]

    if len(bboxes) > 0:
        boxes = torch.tensor(bboxes, dtype=torch.float32)
        labels = torch.tensor(labels, dtype=torch.int64)
    else:
        boxes = torch.zeros((0, 4), dtype=torch.float32)
        labels = torch.zeros((0,), dtype=torch.int64)

    return {
        "image": image,
        "boxes": boxes,
        "labels": labels,
        "image_id": index,
        "orig_size": torch.tensor([orig_h, orig_w]),
    }

CSV Format

image_path,x1,y1,x2,y2,label
img1.jpg,10,20,100,150,car
img1.jpg,50,60,200,180,person
img2.jpg,30,40,120,200,car
  • image_path: relative path to image (multiple rows per image allowed)
  • x1, y1, x2, y2: bounding box coordinates in xyxy format
  • label: class name

Usage Examples

Basic Usage

from autotimm.data import CSVDetectionDataset
import albumentations as A
from albumentations.pytorch import ToTensorV2

transform = A.Compose([
    A.Resize(640, 640),
    A.HorizontalFlip(p=0.5),
    A.Normalize(),
    ToTensorV2(),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

dataset = CSVDetectionDataset(
    csv_path="annotations.csv",
    images_dir="./images",
    transform=transform,
)

print(f"Classes: {dataset.classes}")
print(f"Num images: {len(dataset)}")

# Sample output
sample = dataset[0]
print(f"Boxes: {sample['boxes'].shape}")  # [N, 4]
print(f"Labels: {sample['labels'].shape}")  # [N]

Custom Column Names

# CSV with custom headers:
# filepath,xmin,ymin,xmax,ymax,category
# data/img1.jpg,10,20,100,150,car

dataset = CSVDetectionDataset(
    csv_path="annotations.csv",
    images_dir="./",
    image_column="filepath",
    bbox_columns=["xmin", "ymin", "xmax", "ymax"],
    label_column="category",
)

Parameters

Parameter Type Default Description
csv_path str \| Path Required Path to CSV file with annotations
images_dir str \| Path Required Directory containing images
image_column str "image_path" Name of image path column
bbox_columns list[str] ["x1", "y1", "x2", "y2"] Names of bbox coordinate columns (xyxy format)
label_column str "label" Name of label column
transform Callable \| None None Albumentations transform with bbox support

Return Format

Returns a dictionary with:

Key Type Description
image Tensor [C, H, W] Transformed image
boxes Tensor [N, 4] Bounding boxes in (x1, y1, x2, y2) format
labels Tensor [N] Class indices
image_id int Image index
orig_size Tensor [2] Original image size (H, W)

CSVInstanceDataset

Dataset for instance segmentation from CSV files with mask annotations.

API Reference

autotimm.CSVInstanceDataset

Bases: Dataset

Dataset for instance segmentation from CSV.

CSV format (one row per instance)::

image_path,x_min,y_min,x_max,y_max,label,mask_path
img001.jpg,10,20,100,200,cat,masks/img001_0.png
img001.jpg,50,60,150,250,dog,masks/img001_1.png

Each mask_path is a binary mask PNG for that instance. Does NOT require pycocotools.

Parameters:

Name Type Description Default
csv_path str | Path

Path to CSV file.

required
image_dir str | Path

Directory containing images and masks.

'.'
image_column str

Column name for image paths. Default "image_path".

'image_path'
bbox_columns list[str] | None

Column names for bbox coordinates. Default ["x_min", "y_min", "x_max", "y_max"].

None
label_column str

Column name for class labels. Default "label".

'label'
mask_column str

Column name for mask file paths. Default "mask_path".

'mask_path'
transform Any

Albumentations transforms to apply.

None
Source code in src/autotimm/data/instance_dataset.py
class CSVInstanceDataset(Dataset):
    """Dataset for instance segmentation from CSV.

    CSV format (one row per instance)::

        image_path,x_min,y_min,x_max,y_max,label,mask_path
        img001.jpg,10,20,100,200,cat,masks/img001_0.png
        img001.jpg,50,60,150,250,dog,masks/img001_1.png

    Each ``mask_path`` is a binary mask PNG for that instance.
    Does NOT require pycocotools.

    Args:
        csv_path: Path to CSV file.
        image_dir: Directory containing images and masks.
        image_column: Column name for image paths. Default ``"image_path"``.
        bbox_columns: Column names for bbox coordinates.
            Default ``["x_min", "y_min", "x_max", "y_max"]``.
        label_column: Column name for class labels. Default ``"label"``.
        mask_column: Column name for mask file paths. Default ``"mask_path"``.
        transform: Albumentations transforms to apply.
    """

    def __init__(
        self,
        csv_path: str | Path,
        image_dir: str | Path = ".",
        image_column: str = "image_path",
        bbox_columns: list[str] | None = None,
        label_column: str = "label",
        mask_column: str = "mask_path",
        transform: Any = None,
    ):
        self.csv_path = Path(csv_path)
        self.image_dir = Path(image_dir)
        self.transform = transform

        if bbox_columns is None:
            bbox_columns = ["x_min", "y_min", "x_max", "y_max"]

        # Parse CSV and group by image
        image_anns: dict[str, list[dict]] = defaultdict(list)
        with open(self.csv_path, newline="") as f:
            reader = csv.DictReader(f)
            fieldnames = list(reader.fieldnames or [])

            for col in [image_column, label_column, mask_column] + bbox_columns:
                if col not in fieldnames:
                    raise ValueError(
                        f"Column '{col}' not found in CSV. "
                        f"Available columns: {fieldnames}"
                    )

            for row in reader:
                img_path = row[image_column]
                x1 = float(row[bbox_columns[0]])
                y1 = float(row[bbox_columns[1]])
                x2 = float(row[bbox_columns[2]])
                y2 = float(row[bbox_columns[3]])

                image_anns[img_path].append(
                    {
                        "bbox": [x1, y1, x2, y2],
                        "label": row[label_column],
                        "mask_path": row[mask_column],
                    }
                )

        if not image_anns:
            raise ValueError(f"No images with annotations found in {self.csv_path}.")

        # Build class mapping
        all_labels = sorted(
            {ann["label"] for anns in image_anns.values() for ann in anns}
        )
        self.class_names: list[str] = all_labels
        self._class_to_idx: dict[str, int] = {
            name: idx for idx, name in enumerate(all_labels)
        }
        self.num_classes: int = len(all_labels)

        self._image_paths: list[str] = sorted(image_anns.keys())
        self._image_anns = image_anns

    def __len__(self) -> int:
        return len(self._image_paths)

    def __getitem__(self, idx: int) -> dict[str, Any]:
        img_rel = self._image_paths[idx]
        img_path = self.image_dir / img_rel
        anns = self._image_anns[img_rel]

        image = cv2.imread(str(img_path))
        if image is None:
            raise RuntimeError(f"Failed to load image: {img_path}")
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        height, width = image.shape[:2]

        boxes = []
        labels = []
        masks = []

        for ann in anns:
            boxes.append(ann["bbox"])
            labels.append(self._class_to_idx[ann["label"]])
            mask = cv2.imread(
                str(self.image_dir / ann["mask_path"]), cv2.IMREAD_GRAYSCALE
            )
            if mask is None:
                raise RuntimeError(
                    f"Failed to load mask: {self.image_dir / ann['mask_path']}"
                )
            # Binarize
            mask = (mask > 0).astype(np.uint8)
            masks.append(mask)

        orig_size = torch.tensor([height, width], dtype=torch.long)

        if len(boxes) == 0:
            boxes_arr = np.zeros((0, 4), dtype=np.float32)
            labels_arr = np.zeros((0,), dtype=np.int64)
            masks_arr = np.zeros((0, height, width), dtype=np.uint8)
        else:
            boxes_arr = np.array(boxes, dtype=np.float32)
            labels_arr = np.array(labels, dtype=np.int64)
            masks_arr = np.stack(masks, axis=0)

        if self.transform:
            mask_list = [masks_arr[i] for i in range(len(masks_arr))]
            transformed = self.transform(
                image=image,
                masks=mask_list,
                bboxes=boxes_arr.tolist(),
                labels=labels_arr.tolist(),
            )
            image = transformed["image"]
            bboxes_out = transformed["bboxes"]
            labels_out = transformed["labels"]
            masks_out = transformed["masks"]

            if len(masks_out) > 0:
                masks_arr = np.stack(masks_out, axis=0)
            else:
                masks_arr = np.zeros(
                    (0, image.shape[1], image.shape[2]), dtype=np.uint8
                )

            boxes_t = torch.tensor(bboxes_out, dtype=torch.float32)
            labels_t = torch.tensor(labels_out, dtype=torch.long)
            masks_t = torch.from_numpy(masks_arr).float()
        else:
            image = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0
            boxes_t = torch.from_numpy(boxes_arr).float()
            labels_t = torch.from_numpy(labels_arr).long()
            masks_t = torch.from_numpy(masks_arr).float()

        return {
            "image": image,
            "boxes": boxes_t,
            "labels": labels_t,
            "masks": masks_t,
            "image_id": idx,
            "orig_size": orig_size,
        }

num_classes instance-attribute

num_classes: int = len(all_labels)

__init__

__init__(csv_path: str | Path, image_dir: str | Path = '.', image_column: str = 'image_path', bbox_columns: list[str] | None = None, label_column: str = 'label', mask_column: str = 'mask_path', transform: Any = None)
Source code in src/autotimm/data/instance_dataset.py
def __init__(
    self,
    csv_path: str | Path,
    image_dir: str | Path = ".",
    image_column: str = "image_path",
    bbox_columns: list[str] | None = None,
    label_column: str = "label",
    mask_column: str = "mask_path",
    transform: Any = None,
):
    self.csv_path = Path(csv_path)
    self.image_dir = Path(image_dir)
    self.transform = transform

    if bbox_columns is None:
        bbox_columns = ["x_min", "y_min", "x_max", "y_max"]

    # Parse CSV and group by image
    image_anns: dict[str, list[dict]] = defaultdict(list)
    with open(self.csv_path, newline="") as f:
        reader = csv.DictReader(f)
        fieldnames = list(reader.fieldnames or [])

        for col in [image_column, label_column, mask_column] + bbox_columns:
            if col not in fieldnames:
                raise ValueError(
                    f"Column '{col}' not found in CSV. "
                    f"Available columns: {fieldnames}"
                )

        for row in reader:
            img_path = row[image_column]
            x1 = float(row[bbox_columns[0]])
            y1 = float(row[bbox_columns[1]])
            x2 = float(row[bbox_columns[2]])
            y2 = float(row[bbox_columns[3]])

            image_anns[img_path].append(
                {
                    "bbox": [x1, y1, x2, y2],
                    "label": row[label_column],
                    "mask_path": row[mask_column],
                }
            )

    if not image_anns:
        raise ValueError(f"No images with annotations found in {self.csv_path}.")

    # Build class mapping
    all_labels = sorted(
        {ann["label"] for anns in image_anns.values() for ann in anns}
    )
    self.class_names: list[str] = all_labels
    self._class_to_idx: dict[str, int] = {
        name: idx for idx, name in enumerate(all_labels)
    }
    self.num_classes: int = len(all_labels)

    self._image_paths: list[str] = sorted(image_anns.keys())
    self._image_anns = image_anns

__len__

__len__() -> int
Source code in src/autotimm/data/instance_dataset.py
def __len__(self) -> int:
    return len(self._image_paths)

__getitem__

__getitem__(idx: int) -> dict[str, Any]
Source code in src/autotimm/data/instance_dataset.py
def __getitem__(self, idx: int) -> dict[str, Any]:
    img_rel = self._image_paths[idx]
    img_path = self.image_dir / img_rel
    anns = self._image_anns[img_rel]

    image = cv2.imread(str(img_path))
    if image is None:
        raise RuntimeError(f"Failed to load image: {img_path}")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    height, width = image.shape[:2]

    boxes = []
    labels = []
    masks = []

    for ann in anns:
        boxes.append(ann["bbox"])
        labels.append(self._class_to_idx[ann["label"]])
        mask = cv2.imread(
            str(self.image_dir / ann["mask_path"]), cv2.IMREAD_GRAYSCALE
        )
        if mask is None:
            raise RuntimeError(
                f"Failed to load mask: {self.image_dir / ann['mask_path']}"
            )
        # Binarize
        mask = (mask > 0).astype(np.uint8)
        masks.append(mask)

    orig_size = torch.tensor([height, width], dtype=torch.long)

    if len(boxes) == 0:
        boxes_arr = np.zeros((0, 4), dtype=np.float32)
        labels_arr = np.zeros((0,), dtype=np.int64)
        masks_arr = np.zeros((0, height, width), dtype=np.uint8)
    else:
        boxes_arr = np.array(boxes, dtype=np.float32)
        labels_arr = np.array(labels, dtype=np.int64)
        masks_arr = np.stack(masks, axis=0)

    if self.transform:
        mask_list = [masks_arr[i] for i in range(len(masks_arr))]
        transformed = self.transform(
            image=image,
            masks=mask_list,
            bboxes=boxes_arr.tolist(),
            labels=labels_arr.tolist(),
        )
        image = transformed["image"]
        bboxes_out = transformed["bboxes"]
        labels_out = transformed["labels"]
        masks_out = transformed["masks"]

        if len(masks_out) > 0:
            masks_arr = np.stack(masks_out, axis=0)
        else:
            masks_arr = np.zeros(
                (0, image.shape[1], image.shape[2]), dtype=np.uint8
            )

        boxes_t = torch.tensor(bboxes_out, dtype=torch.float32)
        labels_t = torch.tensor(labels_out, dtype=torch.long)
        masks_t = torch.from_numpy(masks_arr).float()
    else:
        image = torch.from_numpy(image).permute(2, 0, 1).float() / 255.0
        boxes_t = torch.from_numpy(boxes_arr).float()
        labels_t = torch.from_numpy(labels_arr).long()
        masks_t = torch.from_numpy(masks_arr).float()

    return {
        "image": image,
        "boxes": boxes_t,
        "labels": labels_t,
        "masks": masks_t,
        "image_id": idx,
        "orig_size": orig_size,
    }

CSV Format

image_path,mask_path,x1,y1,x2,y2,label
img1.jpg,masks/img1_inst1.png,10,20,100,150,car
img1.jpg,masks/img1_inst2.png,50,60,200,180,person
img2.jpg,masks/img2_inst1.png,30,40,120,200,car
  • image_path: relative path to image
  • mask_path: relative path to binary instance mask
  • x1, y1, x2, y2: bounding box coordinates in xyxy format
  • label: class name

Usage Examples

Basic Usage

from autotimm.data import CSVInstanceDataset
import albumentations as A
from albumentations.pytorch import ToTensorV2

transform = A.Compose([
    A.Resize(640, 640),
    A.HorizontalFlip(p=0.5),
    A.Normalize(),
    ToTensorV2(),
], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels']))

dataset = CSVInstanceDataset(
    csv_path="annotations.csv",
    images_dir="./images",
    masks_dir="./masks",
    transform=transform,
)

print(f"Classes: {dataset.classes}")
print(f"Num images: {len(dataset)}")

# Sample output
sample = dataset[0]
print(f"Boxes: {sample['boxes'].shape}")  # [N, 4]
print(f"Labels: {sample['labels'].shape}")  # [N]
print(f"Masks: {sample['masks'].shape}")  # [N, H, W]

Custom Column Names

dataset = CSVInstanceDataset(
    csv_path="annotations.csv",
    images_dir="./data/images",
    masks_dir="./data/masks",
    image_column="filepath",
    mask_column="mask_filepath",
    bbox_columns=["xmin", "ymin", "xmax", "ymax"],
    label_column="category",
)

Parameters

Parameter Type Default Description
csv_path str \| Path Required Path to CSV file with annotations
images_dir str \| Path Required Directory containing images
masks_dir str \| Path \| None None Directory containing masks (defaults to images_dir)
image_column str "image_path" Name of image path column
mask_column str "mask_path" Name of mask path column
bbox_columns list[str] ["x1", "y1", "x2", "y2"] Names of bbox coordinate columns
label_column str "label" Name of label column
transform Callable \| None None Albumentations transform with bbox and mask support

Return Format

Returns a dictionary with:

Key Type Description
image Tensor [C, H, W] Transformed image
boxes Tensor [N, 4] Bounding boxes in (x1, y1, x2, y2) format
labels Tensor [N] Class indices
masks Tensor [N, H, W] Binary instance masks
image_id int Image index
orig_size Tensor [2] Original image size (H, W)

Best Practices

1. Image Paths

Use relative paths in CSV files:

# Good :material-check:
images/train/img001.jpg,cat

# Bad :material-close: (absolute paths break portability)
/home/user/data/images/train/img001.jpg,cat

2. CSV Validation

Validate your CSV before training:

from autotimm.data import CSVImageDataset

try:
    dataset = CSVImageDataset(csv_path="train.csv", image_dir="./data")
    print(f"Loaded {len(dataset)} samples")
    print(f"Classes: {dataset.classes}")
except Exception as e:
    print(f"CSV validation error: {e}")

3. Transform Consistency

Use the same transform backend (torchvision vs albumentations) for both dataset and data module:

# Consistent albumentations usage
from autotimm import ImageDataModule, TransformConfig

config = TransformConfig(preset="strong", backend="albumentations")
data = ImageDataModule(
    train_csv="train.csv",
    image_dir="./data",
    transform_config=config,
)

4. Performance

For large CSV files:

  • Use persistent_workers=True in DataModule
  • Increase num_workers based on available CPU cores
  • Use pin_memory=True when training on GPU
data = ImageDataModule(
    train_csv="large_train.csv",
    image_dir="./data",
    num_workers=8,
    persistent_workers=True,
    pin_memory=True,
)

See Also