opr.datasets package

Module for datasets.

opr.datasets.augmentations

Data augmentation pipelines.

Point cloud augmentations adopted from the repository: https://github.com/jac99/MinkLocMultimodal, MIT License

class opr.datasets.augmentations.DefaultCloudSetTransform(train: bool = False)[source]

Bases: object

Default point cloud set augmentation pipeline.

class opr.datasets.augmentations.DefaultCloudTransform(train: bool = False)[source]

Bases: object

Default point cloud augmentation pipeline.

class opr.datasets.augmentations.DefaultHM3DImageTransform(train: bool = False, resize: tuple[int, int] | None = (288, 160))[source]

Bases: object

Default image augmentation pipeline.

class opr.datasets.augmentations.DefaultImageTransform(train: bool = False, resize: Tuple[int, int] | None = None)[source]

Bases: object

Default image augmentation pipeline.

class opr.datasets.augmentations.DefaultSemanticTransform(train: bool = False, resize: Tuple[int, int] | None = None)[source]

Bases: object

Default semantic mask augmentation pipeline.

class opr.datasets.augmentations.JitterPoints(sigma=0.01, clip=None, p=1.0)[source]

Bases: object

class opr.datasets.augmentations.OheHotTransform[source]

Bases: object

Rotate by one of the given angles.

class opr.datasets.augmentations.OneHotSemanticTransform(train: bool = False, resize: Tuple[int, int] | None = None)[source]

Bases: object

One-Hot semantic mask augmentation pipeline.

class opr.datasets.augmentations.RandomFlip(p)[source]

Bases: object

class opr.datasets.augmentations.RandomRotation(axis=None, max_theta=180, max_theta2=15)[source]

Bases: object

class opr.datasets.augmentations.RandomScale(min, max)[source]

Bases: object

class opr.datasets.augmentations.RandomShear(delta=0.1)[source]

Bases: object

class opr.datasets.augmentations.RandomTranslation(max_delta=0.05)[source]

Bases: object

class opr.datasets.augmentations.RemoveRandomBlock(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3))[source]

Bases: object

Randomly remove part of the point cloud. Similar to PyTorch RandomErasing but operating on 3D point clouds. Erases fronto-parallel cuboid. Instead of erasing we set coords of removed points to (0, 0, 0) to retain the same number of points

get_params(coords)[source]
class opr.datasets.augmentations.RemoveRandomPoints(r)[source]

Bases: object

opr.datasets.base

Base dataset implementation.

class opr.datasets.base.BasePlaceRecognitionDataset(dataset_root: str | Path, subset: Literal['train', 'val', 'test'], data_to_load: str | Tuple[str, ...], positive_threshold: float = 10.0, negative_threshold: float = 50.0, image_transform: Any | None = None, semantic_transform: Any | None = None, pointcloud_transform: Any | None = None, pointcloud_set_transform: Any | None = None)[source]

Bases: Dataset

Base class for track-based Place Recognition dataset.

collate_fn(data_list: List[Dict[str, Tensor]]) Dict[str, Tensor][source]

Collate function for torch.utils.data.DataLoader.

data_to_load: Tuple[str, ...]
dataset_df: DataFrame
dataset_root: Path
property negatives_mask: Tensor

Boolean mask of negative samples for each element in the dataset.

property nonnegative_index: List[Tensor]

List of indexes of non-negatives samples for each element in the dataset.

property positives_index: List[Tensor]

List of indexes of positive samples for each element in the dataset.

property positives_mask: Tensor

Boolean mask of positive samples for each element in the dataset.

subset: Literal['train', 'val', 'test']

opr.datasets.dataloader_factory

Functions to create PyTorch DataLoaders for different datasets.

opr.datasets.dataloader_factory.make_dataloaders(dataset_cfg: DictConfig, batch_sampler_cfg: DictConfig, num_workers: int = 0) Dict[str, DataLoader][source]

Function to create DataLoader objects from given dataset and sampler configs.

Parameters:
  • dataset_cfg (DictConfig) – Dataset configuration.

  • batch_sampler_cfg (DictConfig) – Batch sampler configuration.

  • num_workers (int) – Number of workers for DataLoader. Defaults to 0.

Returns:

Dictionary with DataLoaders.

Return type:

Dict[str, DataLoader]

opr.datasets.dataloader_factory.make_distributed_dataloaders(dataset_cfg: DictConfig, batch_sampler_cfg: DictConfig, num_workers: int = 0) Dict[str, DataLoader][source]

Function to create DataLoader objects from given dataset and sampler configs.

Parameters:
  • dataset_cfg (DictConfig) – Dataset configuration.

  • batch_sampler_cfg (DictConfig) – Batch sampler configuration.

  • num_workers (int) – Number of workers for DataLoader. Defaults to 0.

Returns:

Dictionary with DataLoaders.

Return type:

Dict[str, DataLoader]

opr.datasets.hm3d

HM3D dataset implementation.

class opr.datasets.hm3d.HM3DDataset(dataset_root: str | Path, subset: Literal['train', 'val', 'test'], data_to_load: str | tuple[str, ...], positive_threshold: float = 5.0, negative_threshold: float = 10.0, positive_iou_threshold: float = 0.1, pointcloud_quantization_size: float = 0.1, max_point_distance: float = 20.0, image_transform: Any | None = None, pointcloud_transform: Any | None = None, pointcloud_set_transform: Any | None = None)[source]

Bases: BasePlaceRecognitionDataset

HM3D dataset implementation.

collate_fn(data_list: list[dict[str, Tensor]]) dict[str, Tensor][source]

Pack input data list into batch.

Parameters:

data_list (List[Dict[str, Tensor]]) – batch data list generated by DataLoader.

Returns:

dictionary of batched data.

Return type:

Dict[str, Tensor]

opr.datasets.itlp

Custom ITLP-Campus dataset implementations.

class opr.datasets.itlp.ITLPCampus(dataset_root: str | ~pathlib.Path, subset: ~typing.Literal['train', 'val', 'test'] | None = None, csv_file: str = 'track.csv', sensors: str | ~typing.Tuple[str, ...] = ('front_cam', 'lidar'), mink_quantization_size: float | None = 0.5, max_point_distance: float | None = None, load_semantics: bool = False, exclude_dynamic_classes: bool = False, load_text_descriptions: bool = False, load_text_labels: bool = False, load_aruco_labels: bool = False, indoor: bool = False, positive_threshold: float = 10.0, negative_threshold: float = 50.0, image_transform=<opr.datasets.augmentations.DefaultImageTransform object>, semantic_transform=<opr.datasets.augmentations.DefaultSemanticTransform object>, late_image_transform=None, load_soc: bool = False, top_k_soc: int = 5, soc_coords_type: ~typing.Literal['cylindrical_3d', 'cylindrical_2d', 'euclidean', 'spherical'] = 'cylindrical_3d', max_distance_soc: float = 50.0, sensors_cfg: ~omegaconf.omegaconf.OmegaConf | None = None, anno: ~omegaconf.omegaconf.OmegaConf | None = None, train_split: list | None = None, test_split: list | None = None)[source]

Bases: Dataset

ITLP Campus dataset implementation.

anno: OmegaConf
aruco_labels_subdir: str = 'aruco_labels'
augment_coords_with_normal(coords: ndarray, mean: Tuple[float, float, float] = (0.0, 0.0, 0.0), std: Tuple[float, float, float] = (1.0, 1.0, 1.0)) ndarray[source]

Augment the coordinates with a random normal distribution.

Parameters:
  • coords (np.ndarray) – The coordinates to be augmented.

  • mean (Tuple[float, float, float], optional) – The mean of the normal distribution. Defaults to (0.0, 0.0, 0.0).

  • std (Tuple[float, float, float], optional) – The standard deviation of the normal distribution. Defaults to (1.0, 1.0, 1.0).

Returns:

The augmented coordinates.

Return type:

np.ndarray

augment_coords_with_rotation(coords: ndarray, angle_range: Tuple = (-3.141592653589793, 3.141592653589793)) ndarray[source]

Augment the coordinates with a random rotation - all objects are rotated by the same, random uniformly distributed angle.

Parameters:
  • coords (np.ndarray) – The coordinates to be augmented.

  • angle_range (Tuple, optional) – The range of the random rotation angle. Defaults to (-np.pi, np.pi).

Returns:

The augmented coordinates.

Return type:

np.ndarray

back_cam_aruco_labels_df: DataFrame | None
back_cam_text_descriptions_df: DataFrame | None
back_cam_text_labels_df: DataFrame | None
cam_config: dict
cloud_set_transform: DefaultCloudSetTransform
clouds_subdir: str = 'lidar'
collate_fn(data_list: List[Dict[str, Tensor]]) Dict[str, Tensor][source]

Pack input data list into batch.

Parameters:

data_list (List[Dict[str, Tensor]]) – batch data list generated by DataLoader.

Returns:

dictionary of batched data.

Return type:

Dict[str, Tensor]

dataset_df: DataFrame
dataset_root: Path
static download_data(out_dir: Path | str) None[source]

Download ITLP-Campus dataset tracks.

Parameters:

out_dir (Union[Path, str]) – Output directory for downloaded tracks.

front_cam_aruco_labels_df: DataFrame | None
front_cam_text_descriptions_df: DataFrame | None
front_cam_text_labels_df: DataFrame | None
image_transform: DefaultImageTransform
images_subdir: str = ''
indoor: bool
load_aruco_labels: bool
load_semantics: bool
load_soc: bool
load_text_descriptions: bool
load_text_labels: bool
max_distance_soc: float
property negatives_mask: Tensor

Boolean mask of negative samples for each element in the dataset.

property nonnegative_index: List[Tensor]

List of indexes of non-negatives samples for each element in the dataset.

pointcloud_transform: DefaultCloudTransform
property positives_index: List[Tensor]

List of indexes of positive samples for each element in the dataset.

property positives_mask: Tensor

Boolean mask of positive samples for each element in the dataset.

semantic_subdir: str = 'masks'
sensors: Tuple[str, ...]
sensors_cfg: OmegaConf
soc_coords_type: Literal['cylindrical_3d', 'cylindrical_2d', 'euclidean', 'spherical'] = 'cylindrical_3d'
subset: Literal['train', 'val', 'test']
test_split: list = None
text_descriptions_subdir: str = 'text_descriptions'
text_labels_subdir: str = 'text_labels'
top_k_soc: int
train_split: list = None
vis_dir: str = './vis/'

opr.datasets.nclt

NCLT dataset implementation.

class opr.datasets.nclt.NCLTDataset(dataset_root: str | Path, subset: Literal['train', 'val', 'test'], data_to_load: str | Tuple[str, ...], positive_threshold: float = 10.0, negative_threshold: float = 50.0, images_dirname: str = 'images_small', masks_dirname: str = 'segmentation_masks_small', pointclouds_dirname: str = 'velodyne_data', use_minkowski: bool = True, pointcloud_quantization_size: float | Tuple[float, float, float] | None = 0.5, max_point_distance: float | None = None, normalize_point_cloud: bool = False, num_points_sample: int | None = None, spherical_coords: bool = False, use_intensity_values: bool = False, image_transform: Any | None = None, semantic_transform: Any | None = None, pointcloud_transform: Any | None = None, pointcloud_set_transform: Any | None = None, load_soc: bool = False, top_k_soc: int = 10, soc_coords_type: Literal['cylindrical_3d', 'cylindrical_2d', 'euclidean', 'spherical'] = 'euclidean', max_distance_soc: float = 50.0, anno: OmegaConf | None = None, exclude_dynamic: bool = False, dynamic_labels: list | None = None)[source]

Bases: BasePlaceRecognitionDataset

NCLT dataset implementation.

augment_coords_with_normal(coords: ndarray, mean: Tuple[float, float, float] = (0.0, 0.0, 0.0), std: Tuple[float, float, float] = (1.0, 1.0, 1.0)) ndarray[source]

Augment the coordinates with a random normal distribution.

Parameters:
  • coords (np.ndarray) – The coordinates to be augmented.

  • mean (Tuple[float, float, float]) – The mean of the normal distribution. Defaults to (0.0, 0.0, 0.0).

  • std (Tuple[float, float, float]) – The standard deviation of the normal distribution. Defaults to (1.0, 1.0, 1.0).

Returns:

The augmented coordinates.

Return type:

np.ndarray

augment_coords_with_rotation(coords: ndarray, angle_range: Tuple = (-3.141592653589793, 3.141592653589793)) ndarray[source]

Augment the coordinates with a random rotation.

All objects are rotated by the same, random uniformly distributed angle.

Parameters:
  • coords (np.ndarray) – The coordinates to be augmented.

  • angle_range (Tuple) – The range of the random rotation angle. Defaults to (-np.pi, np.pi).

Returns:

The augmented coordinates.

Return type:

np.ndarray

collate_fn(data_list: List[Dict[str, Tensor]]) Dict[str, Tensor][source]

Pack input data list into batch.

Parameters:

data_list (List[Dict[str, Tensor]]) – batch data list generated by DataLoader.

Returns:

dictionary of batched data.

Return type:

Dict[str, Tensor]

opr.datasets.oxford

PointNetVLAD Oxford RobotCar dataset implementation.

class opr.datasets.oxford.OxfordDataset(dataset_root: str | Path, subset: Literal['train', 'val', 'test'], data_to_load: str | Tuple[str, ...], positive_threshold: float = 10.0, negative_threshold: float = 50.0, images_dirname: str = 'images_small', masks_dirname: str = 'segmentation_masks_small', pointclouds_dirname: str | None = None, pointcloud_quantization_size: float | Tuple[float, float, float] | None = 0.01, max_point_distance: float | None = None, spherical_coords: bool = False, image_transform: Any | None = None, semantic_transform: Any | None = None, pointcloud_transform: Any | None = None, pointcloud_set_transform: Any | None = None)[source]

Bases: BasePlaceRecognitionDataset

PointNetVLAD Oxford RobotCar dataset implementation.

collate_fn(data_list: List[Dict[str, Tensor]]) Dict[str, Tensor][source]

Pack input data list into batch.

Parameters:

data_list (List[Dict[str, Tensor]]) – batch data list generated by DataLoader.

Returns:

dictionary of batched data.

Return type:

Dict[str, Tensor]

opr.datasets.projection

Projection of pointcloud to camera image plane.

class opr.datasets.projection.NCLTProjector(front=True)[source]

Bases: object

adjust_points(projected_points, center_crop_size=(960, 768), resize_size=(320, 256), original_size=(1616, 1232))[source]

Adjust 3D LiDAR points projected onto the image plane to correspond to the new image after center cropping and resizing.

Parameters: - projected_points: np.ndarray of shape (N, 2) representing the 2D points on the original image. - center_crop_size: Tuple (W_c, H_c) representing the width and height of the crop. - resize_size: Tuple (W_r, H_r) representing the width and height of the resized image. - original_size: Tuple (W, H) representing the width and height of the original image.

Returns: - adjusted_points: np.ndarray of shape (N, 2) representing the adjusted 2D points on the resized image.

project_vel_to_cam(hits, K, x_lb3_c)[source]
ssc_to_homo(ssc)[source]
class opr.datasets.projection.Projector(cam_cfg: OmegaConf, lidar_cfg: OmegaConf)[source]

Bases: object

Class for projecting pointcloud to camera image plane.

build_matrix(x: float, y: float, z: float, q: float) ndarray[source]

Build rotation matrix from quaternion.

Parameters:
  • x (float) – x coordinate

  • y (float) – y coordinate

  • z (float) – z coordinate

  • q (float) – quaternion

Returns:

rotation matrix

Return type:

np.ndarray

project_scan_to_camera(points: ndarray, return_mask: bool | None = True) Tuple[ndarray, ndarray, ndarray] | Tuple[ndarray, ndarray][source]

Project pointcloud to camera image plane.

Parameters:
  • points (np.ndarray) – pointcloud to project

  • return_mask (bool, optional) – whether to return mask. Defaults to True.

Returns:

if return_mask: (uv, depths, in_image) else: (uv, depths)

Return type:

Union[Tuple[np.ndarray, np.ndarray, np.ndarray], Tuple[np.ndarray, np.ndarray]]

Raises:

ValueError – if wrong shape of points array

opr.datasets.soc_utils

Utility functions for Semantic-Object-Context modality.

opr.datasets.soc_utils.cylindrical_to_euclidean(points: ndarray) ndarray[source]

Convert cylindrical coordinates to euclidean.

Parameters:

points (np.ndarray) – array of cylindrical coordinates with shape (n, 3).

Returns:

array of euclidean coordinates with shape (n, 3).

Return type:

points (np.ndarray)

opr.datasets.soc_utils.euclidean_to_cylindrical(points: ndarray, to_2d: bool = False) ndarray[source]

Convert euclidean coordinates to cylindrical.

Parameters:
  • points (np.ndarray) – array of 3D coordinates with shape (n, 3).

  • to_2d (bool, optional) – whether to return 2D cylindrical coordinates. Defaults to False.

Returns:

array of cylindrical coordinates with shape (n, 3) or (n, 2) if to_2d is True.

Return type:

points (np.ndarray)

opr.datasets.soc_utils.euclidean_to_spherical(points: ndarray) ndarray[source]

Convert euclidean coordinates to spherical.

Parameters:

points (np.ndarray) – array of 3D coordinates with shape (n, 3).

Returns:

array of spherical coordinates with shape (n, 3).

Return type:

points (np.ndarray)

opr.datasets.soc_utils.generate_color_sequence(num_colors: int, palette: str | None = 'husl') list[source]

Generate color sequence.

Parameters:
  • num_colors (int) – number of colors to generate

  • palette (str, optional) – palette to use. Defaults to “husl”.

Returns:

list of colors in RGB format.

Return type:

colors (list)

opr.datasets.soc_utils.get_points_labels_by_mask(points: ndarray, mask: ndarray) ndarray[source]

Get point labels from semantic mask.

Parameters:
  • points (np.ndarray) – array of 2D coordinates of projected points with shape (n, 2).

  • cam_resolution. (Coordinates should match with)

  • mask (np.ndarray) – semantic mask in opencv image format (ndarray)

Returns:

point labels taken from the mask.

Return type:

labels (np.ndarray)

opr.datasets.soc_utils.instance_masks_to_objects(instance_masks: dict, points_2d: ndarray, point_labels: ndarray, points_3d: ndarray) dict[source]

Get objects from instance masks.

Parameters:
  • instance_masks (dict) – dict of instances with keys as instance labels and values as instance masks.

  • points_2d (np.ndarray) – 2d points of pointcloud projected to image plane

  • point_labels (np.ndarray) – labels of points

  • points_3d (np.ndarray) – 3d points of pointcloud

Returns:

dict of objects with keys as object labels and values as object properties.

Return type:

objects (dict)

opr.datasets.soc_utils.pack_objects(objects: dict, top_k: int, max_distance: float, special_classes: list) ndarray[source]

Pack objects into a single array.

Parameters:
  • objects (dict) – dict of objects with keys as object labels and values as object properties.

  • top_k (int) – maximum number of each class objects to pack

  • max_distance (float) – maximum distance between objects

  • special_classes (list) – list of special classes to pack

Returns:

array of packed objects with shape (N, K, 3), where N - number of classes, K - number of objects of each class, 3 - 3DoF coords.

Return type:

packed_objects (np.mdarray)

opr.datasets.soc_utils.semantic_mask_to_instances(mask: ndarray, area_threshold: int | None = 10, labels_whitelist: list | None = None) dict[source]

Get instance labels from semantic mask.

Instances are defined as connected components of the same class. Connected components found using opencv connectedComponentsWithStats opencv algorithm in class-wise manner.

Parameters:
  • mask (ndarray) – semantic mask in opencv image format (ndarray)

  • area_threshold (int, optional) – minimum area of instance to be considered. Defaults to 10.

  • labels_whitelist (list, optional) – list of labels to consider. Defaults to None.

Returns:

dict of instances with keys as instance labels and values as instance masks.

Return type:

instances (dict)

opr.datasets.soc_utils.spherical_to_euclidean(points: ndarray) ndarray[source]

Convert spherical coordinates to euclidean.

Parameters:

points (np.ndarray) – array of spherical coordinates with shape (n, 3).

Returns:

array of euclidean coordinates with shape (n, 3).

Return type:

points (np.ndarray)