Skip to content

Commit 8de1314

Browse files
committed
update readme
1 parent 5d88ae4 commit 8de1314

1 file changed

Lines changed: 29 additions & 24 deletions

File tree

examples/evaluation/README.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,35 @@ This document serves as an overview of the evaluation process and provides a bri
88

99
In this section, we will first introduce the commonly used arguments across all datasets. Then, we will provide a more detailed explanation of the specific arguments used for each individual dataset.
1010

11+
- [1. Introduction](#1-Introduction)
12+
- [EvalArgs](#EvalArgs)
13+
- [ModelArgs](#ModelArgs)
14+
- [2. Usage](#2-Usage)
15+
- [Requirements](#Requirements)
16+
- [MTEB](#MTEB)
17+
- [BEIR](#BEIR)
18+
- [MSMARCO](#MSMARCO)
19+
- [MIRACL](#MIRACL)
20+
- [MLDR](#MLDR)
21+
- [MKQA](#MKQA)
22+
- [AIR-Bench](#Air-Bench)
23+
- [Custom Dataset](#Custom-Dataset)
24+
1125
## Introduction
1226

1327
### 1. EvalArgs
1428

1529
**Arguments for evaluation setup:**
1630

1731
- **`eval_name`**: Name of the evaluation task (e.g., msmarco, beir, miracl).
18-
32+
1933
- **`dataset_dir`**: Path to the dataset directory. This can be:
20-
1. A local path to perform evaluation on your dataset (must exist). It should contain:
21-
- `corpus.jsonl`
22-
- `<split>_queries.jsonl`
23-
- `<split>_qrels.jsonl`
24-
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
25-
34+
1. A local path to perform evaluation on your dataset (must exist). It should contain:
35+
- `corpus.jsonl`
36+
- `<split>_queries.jsonl`
37+
- `<split>_qrels.jsonl`
38+
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
39+
2640
- **`force_redownload`**: Set to `True` to force redownload of the dataset. Default is `False`.
2741

2842
- **`dataset_names`**: List of dataset names to evaluate or `None` to evaluate all available datasets. This can be the dataset name (BEIR, etc.) or language (MIRACL, etc.).
@@ -107,11 +121,8 @@ Here is an example for evaluation:
107121

108122
```shell
109123
pip install mteb==1.15.0
110-
```
111-
112-
```shell
113124
python -m FlagEmbedding.evaluation.mteb \
114-
--eval_name mteb \
125+
--eval_name mteb \
115126
--output_dir ./data/mteb/search_results \
116127
--languages eng \
117128
--tasks NFCorpus BiorxivClusteringS2S SciDocsRR \
@@ -133,11 +144,8 @@ Here is an example for evaluation:
133144
pip install beir
134145
mkdir eval_beir
135146
cd eavl_beir
136-
```
137-
138-
```shell
139147
python -m FlagEmbedding.evaluation.beir \
140-
--eval_name beir \
148+
--eval_name beir \
141149
--dataset_dir ./beir/data \
142150
--dataset_names fiqa arguana cqadupstack \
143151
--splits test dev \
@@ -168,7 +176,7 @@ Here is an example for evaluation:
168176

169177
```shell
170178
python -m FlagEmbedding.evaluation.msmarco \
171-
--eval_name msmarco \
179+
--eval_name msmarco \
172180
--dataset_dir ./msmarco/data \
173181
--dataset_names passage \
174182
--splits dev dl19 dl20 \
@@ -198,7 +206,7 @@ Here is an example for evaluation:
198206

199207
```shell
200208
python -m FlagEmbedding.evaluation.miracl \
201-
--eval_name miracl \
209+
--eval_name miracl \
202210
--dataset_dir ./miracl/data \
203211
--dataset_names bn hi sw te th yo \
204212
--splits dev \
@@ -228,7 +236,7 @@ Here is an example for evaluation:
228236

229237
```shell
230238
python -m FlagEmbedding.evaluation.mldr \
231-
--eval_name mldr \
239+
--eval_name mldr \
232240
--dataset_dir ./mldr/data \
233241
--dataset_names hi \
234242
--splits test \
@@ -258,7 +266,7 @@ Here is an example for evaluation:
258266

259267
```shell
260268
python -m FlagEmbedding.evaluation.mkqa \
261-
--eval_name mkqa \
269+
--eval_name mkqa \
262270
--dataset_dir ./mkqa/data \
263271
--dataset_names en zh_cn \
264272
--splits test \
@@ -293,11 +301,8 @@ Here is an example for evaluation:
293301

294302
```shell
295303
pip install air-benchmark
296-
```
297-
298-
```shell
299304
python -m FlagEmbedding.evaluation.air_bench \
300-
--benchmark_version AIR-Bench_24.05 \
305+
--benchmark_version AIR-Bench_24.05 \
301306
--task_types qa long-doc \
302307
--domains arxiv \
303308
--languages en \
@@ -352,7 +357,7 @@ Please put the above file (`corpus.jsonl`, `test_queries.jsonl`, `test_qrels.jso
352357

353358
```shell
354359
python -m FlagEmbedding.evaluation.custom \
355-
--eval_name your_data_name \
360+
--eval_name your_data_name \
356361
--dataset_dir ./your_data_path \
357362
--splits test \
358363
--corpus_embd_save_dir ./your_data_name/corpus_embd \

0 commit comments

Comments
 (0)