opr.models.place_recognition package

Module for Place Recognition models.

opr.models.place_recognition.apgem

Implementation of APGeM Image Model.

class opr.models.place_recognition.apgem.APGeMModel(backbone: Literal['resnet18', 'resnet50', 'vgg16'] = 'resnet50')[source]

Bases: ImageModel

APGeM: ‘Learning with Average Precision: Training Image Retrieval with a Listwise Loss’.

Paper: https://arxiv.org/abs/1906.07589

opr.models.place_recognition.base

Base meta-models for Place Recognition.

class opr.models.place_recognition.base.CloudModel(backbone: Module, head: Module)[source]

Bases: Module

Meta-model for lidar-based Place Recognition. Combines feature extraction backbone and head modules.

forward(batch: Dict[str, Tensor]) Dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.base.ImageModel(backbone: Module, head: Module, fusion: Module | None = None, forward_type: str | None = 'fp32', onnx_model_path: str | None = None, engine_path: str | None = None)[source]

Bases: Module

Meta-model for image-based Place Recognition. Combines feature extraction backbone and head modules.

forward(batch: Dict[str, Tensor]) Dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.base.LateFusionModel(image_module: ImageModel | None = None, semantic_module: SemanticModel | None = None, cloud_module: CloudModel | None = None, soc_module: Module | None = None, fusion_module: Module | None = None)[source]

Bases: Module

Meta-model for multimodal Place Recognition architectures with late fusion.

forward(batch: Dict[str, Tensor]) Dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.base.SemanticModel(backbone: Module, head: Module, fusion: Module | None = None, forward_type: str | None = 'fp32', onnx_model_path: str | None = None, engine_path: str | None = None)[source]

Bases: ImageModel

Meta-model for semantic-based Place Recognition. Combines feature extraction backbone and head modules.

forward(batch: Dict[str, Tensor]) Dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.base.SequenceLateFusionModel(late_fusion_model: LateFusionModel, temporal_fusion_module: Module | None = None)[source]

Bases: Module

Meta-model for sequence-based multimodal Place Recognition with late fusion.

forward(batch: Dict[str, Tensor]) Dict[str, Tensor][source]

Process a sequence of frames efficiently by reshaping to batch processing.

Parameters:

batch – Dictionary containing sequence data with shape [B, S, …] where B is batch size and S is sequence length

Returns:

Dictionary with the final descriptor after temporal fusion

opr.models.place_recognition.cosplace

Implementation of CosPlace model.

class opr.models.place_recognition.cosplace.CosPlaceModel(backbone: Literal['resnet18', 'resnet50', 'vgg16'] = 'resnet50', out_dim: int = 256)[source]

Bases: ImageModel

CosPlace: Rethinking Visual Geo-localization for Large-Scale Applications.

Paper: https://arxiv.org/abs/2204.02287

opr.models.place_recognition.minkloc

Implementations of MinkLoc models.

class opr.models.place_recognition.minkloc.MinkLoc3D(in_channels: int = 1, out_channels: int = 256, num_top_down: int = 1, conv0_kernel_size: int = 5, block: str = 'BasicBlock', layers: Tuple[int, ...] = (1, 1, 1), planes: Tuple[int, ...] = (32, 64, 64), pooling: str = 'gem')[source]

Bases: CloudModel

MinkLoc3D: Point Cloud Based Large-Scale Place Recognition.

Paper: https://arxiv.org/abs/2011.04530 Code is adopted from the original repository: https://github.com/jac99/MinkLoc3Dv2, MIT License

class opr.models.place_recognition.minkloc.MinkLoc3Dv2(in_channels: int = 1, out_channels: int = 256, num_top_down: int = 2, conv0_kernel_size: int = 5, block: str = 'ECABasicBlock', layers: Tuple[int, ...] = (1, 1, 1, 1), planes: Tuple[int, ...] = (64, 128, 64, 32), pooling: str = 'gem')[source]

Bases: MinkLoc3D

Improving Point Cloud Based Place Recognition with Ranking-based Loss and Large Batch Training.

Paper: https://arxiv.org/abs/2203.00972 Code is adopted from the original repository: https://github.com/jac99/MinkLoc3Dv2, MIT License

class opr.models.place_recognition.minkloc.MinkLocMultimodal(lidar_in_channels: int = 1, lidar_out_channels: int = 256, lidar_num_top_down: int = 2, lidar_conv0_kernel_size: int = 5, lidar_block: str = 'ECABasicBlock', lidar_layers: Tuple[int, ...] = (1, 1, 1, 1), lidar_planes: Tuple[int, ...] = (64, 128, 64, 32), lidar_pooling: str = 'gem', image_in_channels: int = 3, image_out_channels: int = 256, image_num_top_down: int = 0, image_pooling: str = 'gem', image_pretrained: bool = True, fusion_type: str = 'concat')[source]

Bases: LateFusionModel

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition.

Paper: https://arxiv.org/pdf/2104.05327.pdf Code is adopted from the original repository: https://github.com/jac99/MinkLocMultimodal, MIT License

opr.models.place_recognition.netvlad

Implementation of NetVLAD model.

class opr.models.place_recognition.netvlad.NetVLADModel(backbone: Literal['resnet18', 'resnet50', 'vgg16'] = 'resnet18', num_clusters: int = 64, normalize_input: bool = True, vladv2: bool = False)[source]

Bases: ImageModel

NetVLAD: CNN architecture for weakly supervised place recognition.

Paper: https://arxiv.org/abs/1511.07247v3 Code is adopted from the repository: https://github.com/Nanne/pytorch-NetVlad

opr.models.place_recognition.overlaptransformer

Implementation of OverlapTransformer model.

class opr.models.place_recognition.overlaptransformer.OverlapTransformer(height: int = 64, width: int = 900, channels: int = 1, norm_layer: Module | None = None, use_transformer: bool = True)[source]

Bases: Module

OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition.

Paper: https://arxiv.org/abs/2203.03397 Adapted from original repository: https://github.com/haomo-ai/OverlapTransformer

forward(batch: dict[str, Tensor]) dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

relu

MHSA num_layers=1 is suggested in our work.

opr.models.place_recognition.patchnetvlad

Implementation of PatchNetVLAD model.

class opr.models.place_recognition.patchnetvlad.PatchNetVLAD(backbone: Literal['resnet18', 'resnet50', 'vgg16'] = 'vgg16', num_clusters: int = 64, normalize_input: bool = True, vladv2: bool = False, use_faiss: bool = True, patch_sizes: tuple[int] = (4,), strides: tuple[int] = (1,))[source]

Bases: ImageModel

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition.

Paper: https://arxiv.org/abs/2103.01486 Code is adopted from original repository: https://github.com/QVPR/Patch-NetVLAD

forward(batch: dict[str, Tensor]) dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_params(clsts: ndarray, traindescs: ndarray) None[source]

Initialize NetVLAD layer parameters.

opr.models.place_recognition.patchnetvlad.get_integral_feature(feat_in: Tensor) Tensor[source]

Input/Output as [N,D,H,W] where N is batch size and D is descriptor dimensions For VLAD, D = K x d where K is the number of clusters and d is the original descriptor dimensions

opr.models.place_recognition.patchnetvlad.get_square_regions_from_integral(feat_integral: Tensor, patch_size: int, patch_stride: int) Tensor[source]

Input as [N,D,H+1,W+1] where additional 1s for last two axes are zero paddings regSize and regStride are single values as only square regions are implemented currently

opr.models.place_recognition.pointnetvlad

Implementation of PointNetVLAD model.

class opr.models.place_recognition.pointnetvlad.Flatten[source]

Bases: Module

Flatten layer.

forward(input: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.pointnetvlad.GatingContext(dim: int, add_batch_norm: bool = True)[source]

Bases: Module

Gating context layer.

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.pointnetvlad.NetVLADLoupe(feature_size: int, max_samples: int, cluster_size: int, output_dim: int, gating: bool = True, add_batch_norm: bool = True, is_training: bool = True)[source]

Bases: Module

NetVLAD aggregation layer with gating mechanism.

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.pointnetvlad.PointNetFeat(num_points: int = 2500, global_feat: bool = True, feature_transform: bool = False, max_pool: bool = True)[source]

Bases: Module

PointNet feature extractor.

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.pointnetvlad.PointNetVLAD(num_points: int = 2500, global_feat: bool = True, feature_transform: bool = False, max_pool: bool = False, output_dim: int = 1024)[source]

Bases: Module

PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition.

Paper: https://arxiv.org/abs/1804.03492 Original repository: https://github.com/mikacuy/pointnetvlad Code is adopted from repository: https://github.com/cattaneod/PointNetVlad-Pytorch

forward(batch: dict[str, Tensor]) dict[str, Tensor][source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.pointnetvlad.STN3d(num_points: int = 2500, k: int = 3, use_bn: bool = True)[source]

Bases: Module

Spatial Transformer Network for 3D data.

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

opr.models.place_recognition.resnet

ResNet image models for Place Recognition.

class opr.models.place_recognition.resnet.ResNet18(in_channels: int = 3, out_channels: int = 256, num_top_down: int = 0, pooling: str = 'gem', pretrained: bool = True)[source]

Bases: ImageModel

ResNet18 image model for Place Recognition.

class opr.models.place_recognition.resnet.SemanticResNet18(in_channels: int = 1, out_channels: int = 256, num_top_down: int = 0, pooling: str = 'gem')[source]

Bases: SemanticModel

ResNet18 semantic mask model for Place Recognition.

opr.models.place_recognition.soc

Semantic-Object-Context modality model.

class opr.models.place_recognition.soc.SOCMLP(num_classes: int, num_objects: int, embeddings_size: int | None = 256)[source]

Bases: SOCModel

Semantic-Object-Context modality model.

forward(x: Tensor) Dict[str, Tensor][source]

Forward pass.

Parameters:

x (Tensor) – input batch

Returns:

output tensor of shape (batch_size, embeddings_size)

Return type:

torch.Tensor

class opr.models.place_recognition.soc.SOCMLPMixer(num_classes: int, num_objects: int, patch_size: int = 1, hidden_dim: int = 64, depth: int = 3, embeddings_size: int = 256)[source]

Bases: SOCModel

Semantic-Object-Context modality model based on MLP Mixer .

Kind of Attention-layer build on top of MLPs. Original paper: https://arxiv.org/abs/2105.01601 implementation: https://github.com/lucidrains/mlp-mixer-pytorch

forward(x: Tensor) Dict[str, Tensor][source]

Forward pass.

Parameters:

x (Tensor) – input batch

Returns:

output dictionary with “final_descriptor” key containing the output tensor

Return type:

Dict[str, Tensor]

class opr.models.place_recognition.soc.SOCMLPMixerModel(model, forward_type='fp32')[source]

Bases: Module

forward(batch)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class opr.models.place_recognition.soc.SOCModel(num_classes: int, num_objects: int, embeddings_size: int | None = 256)[source]

Bases: Module

Semantic-Object-Context modality base model class.

forward(x: Tensor) Dict[str, Tensor][source]

Forward pass.

Parameters:

x (Tensor) – input batch

Returns:

output dictionary

Return type:

Dict[str, Tensor]

opr.models.place_recognition.svtnet

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.

Citation:

Fan, Zhaoxin, et al. “Svt-net: Super light-weight sparse voxel transformer for large scale place recognition.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 1. 2022.

Source: https://github.com/ZhenboSong/SVTNet Paper: https://arxiv.org/abs/2105.00149

class opr.models.place_recognition.svtnet.SVTNet(in_channels: int = 1, out_channels: int = 256, conv0_kernel_size: int = 5, block: str = 'ECABasicBlock', asvt: bool = True, csvt: bool = True, layers: tuple[int, ...] = (1, 1, 1), planes: tuple[int, ...] = (32, 64, 64), pooling: str = 'gem')[source]

Bases: CloudModel

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.

Citation:

Fan, Zhaoxin, et al. “Svt-net: Super light-weight sparse voxel transformer for large scale place recognition.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 1. 2022.

Source: https://github.com/ZhenboSong/SVTNet Paper: https://arxiv.org/abs/2105.00149