CLOVER: Context-Aware Long-Term Object Viewpoint- and Environment- Invariant Representation Learning

Amanda Adkins*, Dongmyeong Lee*, Joydeep Biswas
*Equal Contribution.
The University of Texas at Austin
IEEE Robotics and Automation Letters, 2025

CLOVER architecture: shared-weight encoder (adapted DINOv2 + GeM pooling + MLP) producing context-aware representations, trained with a supervised contrastive loss.

CLOVER (Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning) is a representation-learning method that lets mobile service robots re-identify previously seen objects despite changes in viewpoint, lighting, and weather, without requiring foreground segmentation. MapCLOVER complements CLOVER with a scalable summarization technique that organizes descriptors into an object map and matches incoming observations against it. We also release CODa Re-ID, an in-the-wild dataset of 1,037,814 observations of 557 objects across 8 classes captured under diverse conditions on the UT Austin campus.

Installation

CLOVER is tested on Ubuntu 22.04 with CUDA 12.1, Python 3.10, and a single NVIDIA GPU with ≥24 GB memory. Two installation paths are supported.

Option A — Docker (recommended)

The Docker image bundles the validated CUDA / cuDNN / PyTorch stack so you do not need to match versions on your host beyond the NVIDIA driver. Host prerequisites:

Docker (Engine 20.10+).
An NVIDIA GPU driver compatible with CUDA 12.1+.
The NVIDIA Container Toolkit, which is what lets docker run --gpus all (used by run_docker.sh) actually expose the GPU to the container. Without it, ./run_docker.sh fails with could not select device driver "" with capabilities: [[gpu]].

git clone https://github.com/ut-amrl/clover.git
cd clover
./build_docker.sh                                 # builds clover:latest
./run_docker.sh                                   # mounts the repo + data and opens a shell
# inside the container, you can immediately run:
python src/train.py experiment=coda_sequence/clover

The editable install (pip install -e .) is baked into the image at build time, so you do not need to re-run it after each ./run_docker.sh. If you change pyproject.toml (e.g. add a dependency), rebuild the image with ./build_docker.sh; routine code changes are picked up directly through the bind mount.

The run_docker.sh script expects host directories ./data/ (datasets), ./pretrained_models/ (foundation weights), and ./checkpoints/ (released CLOVER weights) to exist next to the repo — create them before launching if they are missing.

Option B — Local pip install

If you would rather manage the environment yourself:

git clone https://github.com/ut-amrl/clover.git
cd clover
pip install -e .

You are responsible for ensuring your CUDA driver is compatible with torch==2.7.1 (CUDA 12.1+). See pyproject.toml for the full dependency list.

Pointing the code at data and weights stored elsewhere. The Docker launcher honors the env vars CLOVER_DATA_ROOT, PRETRAINED_DIR, and CHECKPOINTS_DIR to relocate the corresponding host directories. Outside Docker, the equivalent is to override the Hydra paths.data_dir key (and the relevant per-config paths) on the command line — paths.data_dir is the parent of CODa/ and ScanNet/, used by every dataset config. For example:

# Data lives on a scratch volume rather than ./data
python src/eval.py experiment=coda_sequence/clover \
    paths.data_dir=/scratch/clover-data \
    ckpt_path=/some/where/clover_coda_sequence.pth \
    model.net.sela_args.foundation_model_path=/some/where/dinov2_vitl14_pretrain.pth

See configs/paths/default.yaml for the full list of overridable path keys.

Pretrained DINOv2 backbone

CLOVER uses a DINOv2 ViT-L/14 backbone. Download Meta's checkpoint and place it at the location referenced in configs/model/clover.yaml:

mkdir -p pretrained_models
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth \
  -O pretrained_models/dinov2_vitl14_pretrain.pth

Dataset Setup

CODa Re-ID

CODa Re-ID is released under the MIT license. It comprises 1,037,814 observations of 557 instances across 8 outdoor object classes (Tree, Pole, Bollard, Informational_Sign, Traffic_Sign, Trash_Can, Fire_Hydrant, Emergency_Phone), captured under varied viewpoint and lighting/weather conditions.

Step 1 — Images. Download the camera-0 (and optionally camera-1) raw image streams from the CODa dataset. The full per-sequence archives are hosted at TACC. Two ways to fetch them:

Manual: browse the dataset page at https://amrl.cs.utexas.edu/coda/download.html and follow the per-sequence links.
Scripted (recommended): the helper below pulls full per-sequence archives from the TACC mirror and extracts only the 2d_raw/<cam>/ subtree, so you don't need to unpack the entire archive. It requires an explicit --all or --sequences <list> so you don't accidentally start a multi-hundred-GB download:
```
./scripts/download_coda_raw.sh --sequences 0,1,2         # subset
./scripts/download_coda_raw.sh --all                     # all paper-relevant sequences (~500 GB cam0 only)
./scripts/download_coda_raw.sh --sequences 3 --cameras cam0,cam1   # add cam1
```
See ./scripts/download_coda_raw.sh --help for full options (output path override, resume, keep-archive, etc.). Each TACC sequence archive is 40–110 GB on the wire; the script reports the size before each download.

Step 2 — Annotations. Download the CODa Re-ID annotation pack (~90 MB compressed; expands to ~1.6 GB on disk) from the Texas Data Repository:

https://doi.org/10.18738/T8/E9WFTW

Alternatively, fetch the annotation pack directly from the Dataverse API in one command (skips clicking through the 10 individual files):

./scripts/download_coda_reid_annotations.sh   # honors CLOVER_DATA_ROOT if set

Step 3 — Layout. Arrange downloads as follows. The host directory can live anywhere — by default it's <repo>/data/, but you can place it elsewhere (e.g. on a scratch volume) and point the Docker mount at it via CLOVER_DATA_ROOT=/path/to/data ./run_docker.sh (see run_docker.sh for details). Whatever path you choose appears as /clover/data inside the container, so the layout below is what the container always sees:

data/CODa/
├── 2d_raw/
│   └── cam0/<sequence>/2d_raw_cam0_<sequence>_<frame>.png   # e.g. 2d_raw_cam0_3_291.png
├── 3d_bbox/
│   └── global/
│       ├── Tree.json
│       ├── Pole.json
│       └── ...                                              # one JSON per class
└── annotations/
    └── cam0/<sequence>/<frame>.json

The 2d_raw/cam0/<seq>/ PNGs use a redundant naming scheme 2d_raw_cam0_<seq>_<frame>.png (matching info["image_file"] in each annotation); the loader at src/datasets/coda.py builds paths in this exact form, so don't strip the prefix or rename. ./scripts/download_coda_raw.sh preserves the naming automatically.

Note on cam0 vs cam1. The CODa Re-ID release contains annotations for both robot cameras (cam0 and cam1), but the main paper used cam0 only — the two cameras are mounted close together on the robot and produce highly overlapping views, so using both mostly duplicates observations while doubling load and training time. The dataset loader in this repo mirrors that choice; cam1 is staged but not loaded by default. See the in-code comment in src/datasets/coda.py (around the camera iteration) for the override.

Annotation schema. Each per-frame annotation JSON contains info (sequence, camera, frame id, weather_condition) and an instances list. Each instance entry includes class, id (object instance id, consistent across sequences), 2D bounding box, and segmentation mask. Per-instance viewpoint information is not stored in the JSON; it is derived in the dataset loader (src/datasets/coda.py) from the sequence/frame metadata at load time. Global 3D bounding boxes per instance live in the 3d_bbox/global/<Class>.json files (centroid cX, cY, cZ, dimensions h, l, w, orientation r, p, y).

Quick check. After downloading, you can verify the dataset loads by constructing a CODataset directly:

from datasets.coda import CODataset
from transforms import TransformCLOVER

ds = CODataset(
    root_dir="data/CODa",
    mode="test",
    split="sequence",
    transform=TransformCLOVER(img_size=224, margin=10, square=True, augmentation=False),
    test_sequences=[3, 4, 5],
    skip_test=50,
)
print(f"Loaded {len(ds)} samples; first label = {ds[0]['label']}")

Dataset splits. CODa Re-ID supports three documented evaluation splits, each defined by its own config under configs/dataset/ and exposed through experiment groups in configs/experiment/:

Split	Config	Tests
`sequence`	coda_sequence.yaml	Generalization to held-out trajectories (different days/lighting)
`class`	coda_class.yaml	Generalization to held-out object classes (train: Tree/Pole/Bollard, test: Informational_Sign/Traffic_Sign)
`region`	coda_region.yaml	Generalization to held-out geographic regions

Note on the default class list. src/datasets/coda.py defines a 3-class default (Tree/Pole/Bollard); the class split explicitly overrides this through train_classes/test_classes in its config. To run sequence/region splits over all 8 classes, edit either the classes_id defaults in coda.py or pass an override via your experiment config.

Annotation utilities. Scripts under coda_reid/ help inspect and regenerate the CODa Re-ID annotations — for example, get_global_3d_bbox.py extracts global-frame 3D bounding boxes from per-frame annotations, and visualize_trajectories.py plots object trajectories.

ScanNet

ScanNet is used as a secondary benchmark and requires preprocessing with SAM2 for instance segmentation.

Download ScanNet from the official site (license terms apply; you must request access).
Install SAM2 and download the sam2_hiera_large.pt checkpoint.

Set the env var pointing to the SAM2 checkpoint:

export SAM2_CHECKPOINT_PATH=/path/to/sam2_hiera_large.pt

Run preprocessing:

bash scannet_utils/preprocess_scannet.sh

Splits are defined by index files under data/ScanNet/index/ (scannet_train.txt, scannet_val.txt, scannet_test.txt).

Quickstart

Train CLOVER on the CODa sequence split and evaluate the resulting checkpoint:

# all commands are run from the repository root
python src/train.py experiment=coda_sequence/clover                     # train
python src/eval.py  experiment=coda_sequence/clover ckpt_path=<path>    # evaluate

Outputs (checkpoints, logs, metrics) are written under output/<task>/runs/<dataset>/<model>/<timestamp>/ per configs/paths/default.yaml. Weights & Biases logging is opt-in: pass project=my-clover-project on the command line (and optionally entity=my-team) to enable. With no project set, no W&B calls are made.

CLOVER Usage

CLOVER is configured via Hydra. The top-level configs are configs/train.yaml and configs/eval.yaml; experiment-specific overrides live under configs/experiment//.yaml. Any field can be overridden from the command line.

Training

python src/train.py experiment=coda_sequence/clover

Common overrides:

# resume training from a checkpoint
python src/train.py experiment=coda_sequence/clover ckpt_path=output/.../checkpoint.pt

# enable W&B logging (default off — see note above)
python src/train.py experiment=coda_sequence/clover project=my-clover-project entity=my-team

# change batch size or epochs
python src/train.py experiment=coda_sequence/clover data.batch_size=32 train.max_epochs=50

Evaluation

python src/eval.py experiment=coda_sequence/clover ckpt_path=<path-to-checkpoint>

What numbers come out, and how they map to the paper. eval.py prints test cases keyed by environment (same/diff/all = same / different / either lighting condition between query and reference) × viewpoint difficulty (easy/med/hard/all). For the headline paper numbers — Table I (sequence split), Table III (region split), Table IV (class split), Table VII (ScanNet) — read the all_all row of the corresponding eval run. The per-condition columns in the paper map onto the printed keys as follows:

Printed key Paper column

same_easy / same_med / same_hard Same Env., {Easy / Med. / Hard} viewpoint

same_all Same Env., All viewpoints

diff_easy / diff_med / diff_hard Diff. Env., {Easy / Med. / Hard} viewpoint

diff_all Diff. Env., All viewpoints

all_easy / all_med / all_hard All Env., {Easy / Med. / Hard} viewpoint

all_all All Env., All viewpoints (the headline number)

For each printed key, mAP matches the paper's mAP column; top@1/top@5/top@10 match the corresponding Top-K columns.

What to expect from reproduction. Running the eval command above on the released checkpoint and the Dataverse-released CODa Re-ID annotation pack reproduces the paper's headline all_all mAP / top-1 / top-5 to within roughly ±0.02, and per-condition columns to within ~6pp. The minor delta reflects a small difference between the annotation snapshot used for the paper's experiments and the currently-released Dataverse pack; the underlying annotation drift is being investigated for a future release. If your reproduced numbers fall well outside this band, that is a signal something is off with your setup rather than expected drift.

3-class default on sequence and region splits. The released CLOVER checkpoints for these two splits were trained against the 3 most-populous classes — Tree, Pole, Bollard — which is also what the paper's Table I and Table III numbers report. src/datasets/coda.py:28-37 reflects this as the default class list. The class split has its own per-config override for the unseen-class generalization eval. See the Dataset Setup section for details.

Available experiments

`experiment=` value	Dataset	Model
`coda_sequence/clover`	CODa Re-ID (sequence split)	CLOVER (ours)
`coda_sequence/reobj`	CODa Re-ID (sequence split)	ReObj baseline
`coda_sequence/ffa`	CODa Re-ID (sequence split)	FFA baseline (no checkpoint required — see Pretrained Models)
`coda_sequence/wdisi`	CODa Re-ID (sequence split)	WDISI baseline
`coda_class/<model>`	CODa Re-ID (class split)	one of clover/reobj/ffa/wdisi
`coda_region/<model>`	CODa Re-ID (region split)	"
`scannet/<model>`	ScanNet	"

MapCLOVER

MapCLOVER summarizes CLOVER descriptors into an object map: for each object instance, the descriptors of its observations — captured from different viewpoints and under different conditions — are clustered with k-means and the cluster centers form a compact representative set. At query time, an incoming observation is matched against these summaries (max cosine similarity) rather than the full observation set, enabling scalable long-term re-identification.

The pipeline lives under src/map/ with three entry points and Hydra configs under configs/mapclover/. All commands run inside the Docker container (see Installation). MapCLOVER is the default method (content clustering + max/closest similarity); random sampling and single-averaging are baselines.

1. Reproduce the paper's MapCLOVER results

Averages retrieval accuracy over route-based map/query sequence permutations of the region-split test set, using the region-trained checkpoint:

# MapCLOVER, representative-set sizes 5 and 10
python src/reproduce_map_results.py \
    ckpt_path=checkpoints/clover_coda_region.pth \
    mapclover.sampler.num_rep_per_obj=5
python src/reproduce_map_results.py \
    ckpt_path=checkpoints/clover_coda_region.pth \
    mapclover.sampler.num_rep_per_obj=10

# Baselines and ablation (other rows of the table)
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/sampler=random     mapclover.sampler.num_rep_per_obj=5   # Random set, max sim
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/sampler=single_avg  mapclover/scorer=avg                  # Average
python src/reproduce_map_results.py ckpt_path=<ckpt> mapclover/scorer=avg          mapclover.sampler.num_rep_per_obj=5   # Clustering, avg sim

Writes averaged Top-1/5/10 (and rank) to retrieval_metrics.csv. max_permutations (default 30) and use_merged_sequences (default true) control the averaging.

Note. Exact paper Table-VIII numbers require the per-frame annotations and the 3d_bbox/global/ files in data/CODa/ to be from the same dataset version (the region split is computed from the global 3D coordinates). A version mismatch changes which instances fall in the map/query sets and shifts the numbers; the descriptor pipeline itself is bit-for-bit identical to the reference.

2. Generate an object map

Builds and serializes a map from gallery observations:

python src/generate_map.py \
    ckpt_path=checkpoints/clover_coda_region.pth \
    mapclover.sampler.num_rep_per_obj=5
# -> <output_dir>/sampled_objs.json

Change the checkpoint with ckpt_path=... (the region-trained clover_coda_region.pth is recommended; clover_coda_sequence.pth also works).
Change the representative-set size with mapclover.sampler.num_rep_per_obj=10.
Use a baseline summarizer with mapclover/sampler=random or mapclover/sampler=single_avg.
Point at your own gallery by overriding the dataset sequences in configs/mapclover/dataset/coda_map_gallery.yaml.

3. Query a map (inference — top-N matches + scores)

Matches new observations against a saved map and writes, per query, the top-N matching instance ids and similarity scores to matches.json. No ground truth required.

# Query from a CODa split
python src/query_map.py \
    ckpt_path=checkpoints/clover_coda_region.pth \
    map_file=<path>/sampled_objs.json query.top_n=5

# Bring your own images + bounding boxes (e.g. from a detector)
python src/query_map.py ckpt_path=<ckpt> map_file=<path>/sampled_objs.json \
    query.images_dir=/path/to/images query.bboxes=/path/to/bboxes.json query.top_n=5

For the image+bbox mode, bboxes is a JSON list of {"image": <path relative to images_dir>, "bbox": [x1, y1, x2, y2], "class": <optional int>}.

Pretrained Models

Released CLOVER checkpoints are hosted on Google Drive. Download the checkpoint you need and place it under checkpoints/ (or pass ckpt_path=<full-path> to eval.py).

Each checkpoint reproduces the corresponding row of the paper's main tables — CODa sequence: Table I; CODa region: Table III; CODa class: Table IV; ScanNet: Table VII. Refer to the paper for the full per-condition breakdowns (illumination, viewpoint difficulty, etc.).

Split	Download
CODa sequence	clover_coda_sequence.pth
CODa region	clover_coda_region.pth
CODa class	uses CODa-region checkpoint; see note below
ScanNet	clover_scannet.pth

Note on the class split. The class-generalization results come from evaluating the CODa-region-trained checkpoint against held-out classes — there is no separately-trained class checkpoint. Run with python src/eval.py experiment=coda_class/clover ckpt_path=checkpoints/clover_coda_region.pth.

Baselines (ReObj, FFA, WDISI) used for comparison in the paper are documented separately in the Baseline Checkpoints section below.

Running CLOVER on Your Own Data

For end-to-end use beyond pre-cropped images — including bbox-aware inputs, YOLOv11 detection, and building a map from custom data — see docs/custom_data.md.

The script below covers the most common case: pre-cropped object images → CLOVER embeddings. For end-to-end use on a new dataset (detector → CLOVER embeddings → MapCLOVER retrieval), combine this script with the MapCLOVER section above — generate_map.py and query_map.py support custom image directories with bounding-box inputs.

Extract embeddings from a folder of cropped images

For users who already have pre-cropped object images and want CLOVER embeddings directly (no MapCLOVER needed), scripts/extract_embeddings.py is a runnable example:

python scripts/extract_embeddings.py \
  --checkpoint checkpoints/clover_coda_sequence.pth \
  --input-dir  /path/to/cropped_images/ \
  --output     embeddings.npz \
  --device     cuda   # default cpu

Output is a .npz with filenames (N,) and embeddings (N, 1024) arrays. Pass --use-head to get the 128-d projection-head output instead of the 1024-d content vector.

Baseline Checkpoints

For reproducing the baseline rows in the paper's main tables, the trained weights for ReObj and WDISI are also released. They are not part of CLOVER's contribution — they exist here so reviewers and future researchers can compare directly without re-training.

Model	Split	Download
ReObj	CODa sequence	reobj_coda_sequence.pth
ReObj	CODa region	reobj_coda_region.pth
ReObj	ScanNet	reobj_scannet.pth
WDISI	CODa sequence	wdisi_coda_sequence.pth
WDISI	CODa region	wdisi_coda_region.pth
WDISI	ScanNet	wdisi_scannet.pth

FFA has no checkpoint. The FFA baseline (from "Are These the Same Apple?") has no trainable parameters — it runs entirely on a frozen DINOv2 backbone with foreground-aware mean pooling. There is nothing to download beyond the DINOv2 weights already required for CLOVER. Run any FFA experiment with python src/eval.py experiment=<split>/ffa (no ckpt_path needed).

Supplemental Material

A supplemental document covers additional results that did not fit in the main paper:

Additional ablation rows (Triplet Loss + No Encoder MLP)
Full YOLOv11-detection results on the sequence split
Qualitative dataset comparison (CODa Re-ID vs. ScanNet)
Occlusion robustness curves (mAP / Top-1 / Top-5 vs. % occluded area) and representative occluded examples in the dataset
Margin-sensitivity ablations on the region split and ScanNet

See docs/supplemental.pdf (or its source docs/supplemental.tex) for figures, tables, and discussion.

Contact

Questions, bug reports, or feedback: please open an issue on the GitHub Issues page.

Citation

If you use CLOVER, MapCLOVER, or the CODa Re-ID dataset in your research, please cite:

@article{adkins2025clover,
  author  = {Adkins, Amanda and Lee, Dongmyeong and Biswas, Joydeep},
  title   = {{CLOVER: Context-Aware Long-Term Object Viewpoint- and Environment- Invariant Representation Learning}},
  journal = {IEEE Robotics and Automation Letters},
  year    = {2025},
  volume  = {10},
  number  = {11},
  pages   = {11928--11935},
  doi     = {10.1109/LRA.2025.3613991}
}

Acknowledgments

This work was developed at UT Austin AMRL. CLOVER's backbone builds on DINOv2 and the SelaVPR encoder design. The CODa Re-ID dataset extends the UT Campus Object Dataset (CODa) with instance-level re-identification annotations.

This work is partially supported by the National Science Foundation (GRFP DGE-2137420, CAREER-2046955), ARL SARA (W911NF-24-2-0025), and Amazon Lab126. Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

License

This repository is released under the MIT License — see LICENSE. The CODa Re-ID dataset is also released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
coda_reid		coda_reid
configs		configs
data/ScanNet/index		data/ScanNet/index
docs		docs
scannet_utils		scannet_utils
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_docker.sh		build_docker.sh
pyproject.toml		pyproject.toml
run_docker.sh		run_docker.sh

Printed key	Paper column
`same_easy` / `same_med` / `same_hard`	Same Env., {Easy / Med. / Hard} viewpoint
`same_all`	Same Env., All viewpoints
`diff_easy` / `diff_med` / `diff_hard`	Diff. Env., {Easy / Med. / Hard} viewpoint
`diff_all`	Diff. Env., All viewpoints
`all_easy` / `all_med` / `all_hard`	All Env., {Easy / Med. / Hard} viewpoint
`all_all`	All Env., All viewpoints (the headline number)

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CLOVER: Context-Aware Long-Term Object Viewpoint- and Environment- Invariant Representation Learning

Table of Contents

Installation

Option A — Docker (recommended)

Option B — Local pip install

Pretrained DINOv2 backbone

Dataset Setup

CODa Re-ID

ScanNet

Quickstart

CLOVER Usage

Training

Evaluation

Available experiments

MapCLOVER

1. Reproduce the paper's MapCLOVER results

2. Generate an object map

3. Query a map (inference — top-N matches + scores)

Pretrained Models

Running CLOVER on Your Own Data

Extract embeddings from a folder of cropped images

Baseline Checkpoints

Supplemental Material

Contact

Citation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages