CLOVER: Context-Aware Long-Term Object Viewpoint- and Environment- Invariant Representation Learning
Amanda Adkins*,
Dongmyeong Lee*,
Joydeep Biswas
*Equal Contribution.
The University of Texas at Austin
IEEE Robotics and Automation Letters, 2025
CLOVER architecture: shared-weight encoder (adapted DINOv2 + GeM pooling + MLP) producing context-aware representations, trained with a supervised contrastive loss.
CLOVER (Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning) is a representation-learning method that lets mobile service robots re-identify previously seen objects despite changes in viewpoint, lighting, and weather, without requiring foreground segmentation. MapCLOVER complements CLOVER with a scalable summarization technique that organizes descriptors into an object map and matches incoming observations against it. We also release CODa Re-ID, an in-the-wild dataset of 1,037,814 observations of 557 objects across 8 classes captured under diverse conditions on the UT Austin campus.
- Installation
- Dataset Setup
- Quickstart
- CLOVER Usage
- MapCLOVER
- Pretrained Models
- Running CLOVER on Your Own Data
- Baseline Checkpoints
- Supplemental Material
- Contact
- Citation
- Acknowledgments
- License
CLOVER is tested on Ubuntu 22.04 with CUDA 12.1, Python 3.10, and a single NVIDIA GPU with ≥24 GB memory. Two installation paths are supported.
The Docker image bundles the validated CUDA / cuDNN / PyTorch stack so you do not need to match versions on your host beyond the NVIDIA driver. Host prerequisites:
- Docker (Engine 20.10+).
- An NVIDIA GPU driver compatible with CUDA 12.1+.
- The NVIDIA Container Toolkit, which is what lets
docker run --gpus all(used byrun_docker.sh) actually expose the GPU to the container. Without it,./run_docker.shfails withcould not select device driver "" with capabilities: [[gpu]].
git clone https://github.com/ut-amrl/clover.git
cd clover
./build_docker.sh # builds clover:latest
./run_docker.sh # mounts the repo + data and opens a shell
# inside the container, you can immediately run:
python src/train.py experiment=coda_sequence/cloverThe editable install (pip install -e .) is baked into the image at build time, so you do not need to re-run it after each ./run_docker.sh. If you change pyproject.toml (e.g. add a dependency), rebuild the image with ./build_docker.sh; routine code changes are picked up directly through the bind mount.
The run_docker.sh script expects host directories ./data/ (datasets), ./pretrained_models/ (foundation weights), and ./checkpoints/ (released CLOVER weights) to exist next to the repo — create them before launching if they are missing.
If you would rather manage the environment yourself:
git clone https://github.com/ut-amrl/clover.git
cd clover
pip install -e .You are responsible for ensuring your CUDA driver is compatible with torch==2.7.1 (CUDA 12.1+). See pyproject.toml for the full dependency list.
Pointing the code at data and weights stored elsewhere. The Docker
launcher honors the env vars CLOVER_DATA_ROOT, PRETRAINED_DIR, and
CHECKPOINTS_DIR to relocate the corresponding host directories. Outside
Docker, the equivalent is to override the Hydra paths.data_dir key (and
the relevant per-config paths) on the command line — paths.data_dir is
the parent of CODa/ and ScanNet/, used by every dataset config. For
example:
# Data lives on a scratch volume rather than ./data
python src/eval.py experiment=coda_sequence/clover \
paths.data_dir=/scratch/clover-data \
ckpt_path=/some/where/clover_coda_sequence.pth \
model.net.sela_args.foundation_model_path=/some/where/dinov2_vitl14_pretrain.pthSee configs/paths/default.yaml for the full list of overridable path keys.
CLOVER uses a DINOv2 ViT-L/14 backbone. Download Meta's checkpoint and place it at the location referenced in configs/model/clover.yaml:
mkdir -p pretrained_models
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth \
-O pretrained_models/dinov2_vitl14_pretrain.pthCODa Re-ID is released under the MIT license. It comprises 1,037,814 observations of 557 instances across 8 outdoor object classes (Tree, Pole, Bollard, Informational_Sign, Traffic_Sign, Trash_Can, Fire_Hydrant, Emergency_Phone), captured under varied viewpoint and lighting/weather conditions.
Step 1 — Images. Download the camera-0 (and optionally camera-1) raw image streams from the CODa dataset. The full per-sequence archives are hosted at TACC. Two ways to fetch them:
-
Manual: browse the dataset page at https://amrl.cs.utexas.edu/coda/download.html and follow the per-sequence links.
-
Scripted (recommended): the helper below pulls full per-sequence archives from the TACC mirror and extracts only the
2d_raw/<cam>/subtree, so you don't need to unpack the entire archive. It requires an explicit--allor--sequences <list>so you don't accidentally start a multi-hundred-GB download:./scripts/download_coda_raw.sh --sequences 0,1,2 # subset ./scripts/download_coda_raw.sh --all # all paper-relevant sequences (~500 GB cam0 only) ./scripts/download_coda_raw.sh --sequences 3 --cameras cam0,cam1 # add cam1
See
./scripts/download_coda_raw.sh --helpfor full options (output path override, resume, keep-archive, etc.). Each TACC sequence archive is 40–110 GB on the wire; the script reports the size before each download.
Step 2 — Annotations. Download the CODa Re-ID annotation pack (~90 MB compressed; expands to ~1.6 GB on disk) from the Texas Data Repository:
Alternatively, fetch the annotation pack directly from the Dataverse API in one command (skips clicking through the 10 individual files):
./scripts/download_coda_reid_annotations.sh # honors CLOVER_DATA_ROOT if setStep 3 — Layout. Arrange downloads as follows. The host directory can live anywhere — by default it's <repo>/data/, but you can place it elsewhere (e.g. on a scratch volume) and point the Docker mount at it via CLOVER_DATA_ROOT=/path/to/data ./run_docker.sh (see run_docker.sh for details). Whatever path you choose appears as /clover/data inside the container, so the layout below is what the container always sees:
data/CODa/
├── 2d_raw/
│ └── cam0/<sequence>/2d_raw_cam0_<sequence>_<frame>.png # e.g. 2d_raw_cam0_3_291.png
├── 3d_bbox/
│ └── global/
│ ├── Tree.json
│ ├── Pole.json
│ └── ... # one JSON per class
└── annotations/
└── cam0/<sequence>/<frame>.json
The 2d_raw/cam0/<seq>/ PNGs use a redundant naming scheme 2d_raw_cam0_<seq>_<frame>.png (matching info["image_file"] in each annotation); the loader at src/datasets/coda.py builds paths in this exact form, so don't strip the prefix or rename. ./scripts/download_coda_raw.sh preserves the naming automatically.
Note on cam0 vs cam1. The CODa Re-ID release contains annotations for both robot cameras (cam0 and cam1), but the main paper used cam0 only — the two cameras are mounted close together on the robot and produce highly overlapping views, so using both mostly duplicates observations while doubling load and training time. The dataset loader in this repo mirrors that choice; cam1 is staged but not loaded by default. See the in-code comment in src/datasets/coda.py (around the camera iteration) for the override.
Annotation schema. Each per-frame annotation JSON contains info (sequence, camera, frame id, weather_condition) and an instances list. Each instance entry includes class, id (object instance id, consistent across sequences), 2D bounding box, and segmentation mask. Per-instance viewpoint information is not stored in the JSON; it is derived in the dataset loader (src/datasets/coda.py) from the sequence/frame metadata at load time. Global 3D bounding boxes per instance live in the 3d_bbox/global/<Class>.json files (centroid cX, cY, cZ, dimensions h, l, w, orientation r, p, y).
Quick check. After downloading, you can verify the dataset loads by constructing a CODataset directly:
from datasets.coda import CODataset
from transforms import TransformCLOVER
ds = CODataset(
root_dir="data/CODa",
mode="test",
split="sequence",
transform=TransformCLOVER(img_size=224, margin=10, square=True, augmentation=False),
test_sequences=[3, 4, 5],
skip_test=50,
)
print(f"Loaded {len(ds)} samples; first label = {ds[0]['label']}")Dataset splits. CODa Re-ID supports three documented evaluation splits, each defined by its own config under configs/dataset/ and exposed through experiment groups in configs/experiment/:
| Split | Config | Tests |
|---|---|---|
sequence |
coda_sequence.yaml | Generalization to held-out trajectories (different days/lighting) |
class |
coda_class.yaml | Generalization to held-out object classes (train: Tree/Pole/Bollard, test: Informational_Sign/Traffic_Sign) |
region |
coda_region.yaml | Generalization to held-out geographic regions |
Note on the default class list. src/datasets/coda.py defines a 3-class default (Tree/Pole/Bollard); the class split explicitly overrides this through train_classes/test_classes in its config. To run sequence/region splits over all 8 classes, edit either the classes_id defaults in coda.py or pass an override via your experiment config.
Annotation utilities. Scripts under coda_reid/ help inspect and regenerate the CODa Re-ID annotations — for example, get_global_3d_bbox.py extracts global-frame 3D bounding boxes from per-frame annotations, and visualize_trajectories.py plots object trajectories.
ScanNet is used as a secondary benchmark and requires preprocessing with SAM2 for instance segmentation.
- Download ScanNet from the official site (license terms apply; you must request access).
- Install SAM2 and download the
sam2_hiera_large.ptcheckpoint. - Set the env var pointing to the SAM2 checkpoint:
export SAM2_CHECKPOINT_PATH=/path/to/sam2_hiera_large.pt - Run preprocessing:
bash scannet_utils/preprocess_scannet.sh
Splits are defined by index files under data/ScanNet/index/ (scannet_train.txt, scannet_val.txt, scannet_test.txt).
Train CLOVER on the CODa sequence split and evaluate the resulting checkpoint:
# all commands are run from the repository root
python src/train.py experiment=coda_sequence/clover # train
python src/eval.py experiment=coda_sequence/clover ckpt_path=<path> # evaluateOutputs (checkpoints, logs, metrics) are written under output/<task>/runs/<dataset>/<model>/<timestamp>/ per configs/paths/default.yaml. Weights & Biases logging is opt-in: pass project=my-clover-project on the command line (and optionally entity=my-team) to enable. With no project set, no W&B calls are made.
CLOVER is configured via Hydra. The top-level configs are configs/train.yaml and configs/eval.yaml; experiment-specific overrides live under configs/experiment//.yaml. Any field can be overridden from the command line.
python src/train.py experiment=coda_sequence/cloverCommon overrides:
# resume training from a checkpoint
python src/train.py experiment=coda_sequence/clover ckpt_path=output/.../checkpoint.pt
# enable W&B logging (default off — see note above)
python src/train.py experiment=coda_sequence/clover project=my-clover-project entity=my-team
# change batch size or epochs
python src/train.py experiment=coda_sequence/clover data.batch_size=32 train.max_epochs=50python src/eval.py experiment=coda_sequence/clover ckpt_path=<path-to-checkpoint>What numbers come out, and how they map to the paper.
eval.pyprints test cases keyed by environment (same/diff/all= same / different / either lighting condition between query and reference) × viewpoint difficulty (easy/med/hard/all). For the headline paper numbers — Table I (sequence split), Table III (region split), Table IV (class split), Table VII (ScanNet) — read theall_allrow of the corresponding eval run. The per-condition columns in the paper map onto the printed keys as follows:
Printed key Paper column same_easy/same_med/same_hardSame Env., {Easy / Med. / Hard} viewpoint same_allSame Env., All viewpoints diff_easy/diff_med/diff_hardDiff. Env., {Easy / Med. / Hard} viewpoint diff_allDiff. Env., All viewpoints all_easy/all_med/all_hardAll Env., {Easy / Med. / Hard} viewpoint all_allAll Env., All viewpoints (the headline number) For each printed key,
mAPmatches the paper's mAP column;top@1/top@5/top@10match the corresponding Top-K columns.What to expect from reproduction. Running the eval command above on the released checkpoint and the Dataverse-released CODa Re-ID annotation pack reproduces the paper's headline
all_allmAP / top-1 / top-5 to within roughly ±0.02, and per-condition columns to within ~6pp. The minor delta reflects a small difference between the annotation snapshot used for the paper's experiments and the currently-released Dataverse pack; the underlying annotation drift is being investigated for a future release. If your reproduced numbers fall well outside this band, that is a signal something is off with your setup rather than expected drift.3-class default on
sequenceandregionsplits. The released CLOVER checkpoints for these two splits were trained against the 3 most-populous classes — Tree, Pole, Bollard — which is also what the paper's Table I and Table III numbers report.src/datasets/coda.py:28-37reflects this as the default class list. Theclasssplit has its own per-config override for the unseen-class generalization eval. See the Dataset Setup section for details.
experiment= value |
Dataset | Model |
|---|---|---|
coda_sequence/clover |
CODa Re-ID (sequence split) | CLOVER (ours) |
coda_sequence/reobj |
CODa Re-ID (sequence split) | ReObj baseline |
coda_sequence/ffa |
CODa Re-ID (sequence split) | FFA baseline (no checkpoint required — see Pretrained Models) |
coda_sequence/wdisi |
CODa Re-ID (sequence split) | WDISI baseline |
coda_class/<model> |
CODa Re-ID (class split) | one of clover/reobj/ffa/wdisi |
coda_region/<model> |
CODa Re-ID (region split) | " |
scannet/<model> |
ScanNet | " |
MapCLOVER summarizes CLOVER descriptors into an object map: for each object instance, the descriptors of its observations — captured from different viewpoints and under different conditions — are clustered with k-means and the cluster centers form a compact representative set. At query time, an incoming observation is matched against these summaries (max cosine similarity) rather than the full observation set, enabling scalable long-term re-identification.
The pipeline lives under src/map/ with three entry points and Hydra configs under configs/mapclover/. All commands run inside the Docker container (see Installation). MapCLOVER is the default method (content clustering + max/closest similarity); random sampling and single-averaging are baselines.
Averages retrieval accuracy over route-based map/query sequence permutations of the region-split test set, using the region-trained checkpoint:
# MapCLOVER, representative-set sizes 5 and 10
python src/reproduce_map_results.py \
ckpt_path=checkpoints/clover_coda_region.pth \
mapclover.sampler.num_rep_per_obj=5
python src/reproduce_map_results.py \
ckpt_path=checkpoints/clover_coda_region.pth \
mapclover.sampler.num_rep_per_obj=10
# Baselines and ablation (other rows of the table)
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/sampler=random mapclover.sampler.num_rep_per_obj=5 # Random set, max sim
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/sampler=single_avg mapclover/scorer=avg # Average
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/scorer=avg mapclover.sampler.num_rep_per_obj=5 # Clustering, avg simWrites averaged Top-1/5/10 (and rank) to retrieval_metrics.csv. max_permutations (default 30) and use_merged_sequences (default true) control the averaging.
Note. Exact paper Table-VIII numbers require the per-frame annotations and the
3d_bbox/global/files indata/CODa/to be from the same dataset version (the region split is computed from the global 3D coordinates). A version mismatch changes which instances fall in the map/query sets and shifts the numbers; the descriptor pipeline itself is bit-for-bit identical to the reference.
Builds and serializes a map from gallery observations:
python src/generate_map.py \
ckpt_path=checkpoints/clover_coda_region.pth \
mapclover.sampler.num_rep_per_obj=5
# -> <output_dir>/sampled_objs.json- Change the checkpoint with
ckpt_path=...(the region-trainedclover_coda_region.pthis recommended;clover_coda_sequence.pthalso works). - Change the representative-set size with
mapclover.sampler.num_rep_per_obj=10. - Use a baseline summarizer with
mapclover/sampler=randomormapclover/sampler=single_avg. - Point at your own gallery by overriding the dataset sequences in configs/mapclover/dataset/coda_map_gallery.yaml.
Matches new observations against a saved map and writes, per query, the top-N matching instance ids and similarity scores to matches.json. No ground truth required.
# Query from a CODa split
python src/query_map.py \
ckpt_path=checkpoints/clover_coda_region.pth \
map_file=<path>/sampled_objs.json query.top_n=5
# Bring your own images + bounding boxes (e.g. from a detector)
python src/query_map.py ckpt_path=<ckpt> map_file=<path>/sampled_objs.json \
query.images_dir=/path/to/images query.bboxes=/path/to/bboxes.json query.top_n=5For the image+bbox mode, bboxes is a JSON list of {"image": <path relative to images_dir>, "bbox": [x1, y1, x2, y2], "class": <optional int>}.
Released CLOVER checkpoints are hosted on Google Drive. Download the checkpoint you need and place it under checkpoints/ (or pass ckpt_path=<full-path> to eval.py).
Each checkpoint reproduces the corresponding row of the paper's main tables — CODa sequence: Table I; CODa region: Table III; CODa class: Table IV; ScanNet: Table VII. Refer to the paper for the full per-condition breakdowns (illumination, viewpoint difficulty, etc.).
| Split | Download |
|---|---|
| CODa sequence | clover_coda_sequence.pth |
| CODa region | clover_coda_region.pth |
| CODa class | uses CODa-region checkpoint; see note below |
| ScanNet | clover_scannet.pth |
Note on the class split. The class-generalization results come from evaluating the CODa-region-trained checkpoint against held-out classes — there is no separately-trained class checkpoint. Run with
python src/eval.py experiment=coda_class/clover ckpt_path=checkpoints/clover_coda_region.pth.
Baselines (ReObj, FFA, WDISI) used for comparison in the paper are documented separately in the Baseline Checkpoints section below.
For end-to-end use beyond pre-cropped images — including bbox-aware inputs, YOLOv11 detection, and building a map from custom data — see docs/custom_data.md.
The script below covers the most common case: pre-cropped object images → CLOVER embeddings. For end-to-end use on a new dataset (detector → CLOVER embeddings → MapCLOVER retrieval), combine this script with the MapCLOVER section above — generate_map.py and query_map.py support custom image directories with bounding-box inputs.
For users who already have pre-cropped object images and want CLOVER embeddings directly (no MapCLOVER needed), scripts/extract_embeddings.py is a runnable example:
python scripts/extract_embeddings.py \
--checkpoint checkpoints/clover_coda_sequence.pth \
--input-dir /path/to/cropped_images/ \
--output embeddings.npz \
--device cuda # default cpuOutput is a .npz with filenames (N,) and embeddings (N, 1024) arrays. Pass --use-head to get the 128-d projection-head output instead of the 1024-d content vector.
For reproducing the baseline rows in the paper's main tables, the trained weights for ReObj and WDISI are also released. They are not part of CLOVER's contribution — they exist here so reviewers and future researchers can compare directly without re-training.
| Model | Split | Download |
|---|---|---|
| ReObj | CODa sequence | reobj_coda_sequence.pth |
| ReObj | CODa region | reobj_coda_region.pth |
| ReObj | ScanNet | reobj_scannet.pth |
| WDISI | CODa sequence | wdisi_coda_sequence.pth |
| WDISI | CODa region | wdisi_coda_region.pth |
| WDISI | ScanNet | wdisi_scannet.pth |
FFA has no checkpoint. The FFA baseline (from "Are These the Same Apple?") has no trainable parameters — it runs entirely on a frozen DINOv2 backbone with foreground-aware mean pooling. There is nothing to download beyond the DINOv2 weights already required for CLOVER. Run any FFA experiment with
python src/eval.py experiment=<split>/ffa(nockpt_pathneeded).
A supplemental document covers additional results that did not fit in the main paper:
- Additional ablation rows (Triplet Loss + No Encoder MLP)
- Full YOLOv11-detection results on the sequence split
- Qualitative dataset comparison (CODa Re-ID vs. ScanNet)
- Occlusion robustness curves (mAP / Top-1 / Top-5 vs. % occluded area) and representative occluded examples in the dataset
- Margin-sensitivity ablations on the region split and ScanNet
See docs/supplemental.pdf (or its source docs/supplemental.tex) for figures, tables, and discussion.
Questions, bug reports, or feedback: please open an issue on the GitHub Issues page.
If you use CLOVER, MapCLOVER, or the CODa Re-ID dataset in your research, please cite:
@article{adkins2025clover,
author = {Adkins, Amanda and Lee, Dongmyeong and Biswas, Joydeep},
title = {{CLOVER: Context-Aware Long-Term Object Viewpoint- and Environment- Invariant Representation Learning}},
journal = {IEEE Robotics and Automation Letters},
year = {2025},
volume = {10},
number = {11},
pages = {11928--11935},
doi = {10.1109/LRA.2025.3613991}
}This work was developed at UT Austin AMRL. CLOVER's backbone builds on DINOv2 and the SelaVPR encoder design. The CODa Re-ID dataset extends the UT Campus Object Dataset (CODa) with instance-level re-identification annotations.
This work is partially supported by the National Science Foundation (GRFP DGE-2137420, CAREER-2046955), ARL SARA (W911NF-24-2-0025), and Amazon Lab126. Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.
This repository is released under the MIT License — see LICENSE. The CODa Re-ID dataset is also released under the MIT License.