You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have provided the inference code for two types of models: the **embedder** and the **reranker**. These can be loaded using `FlagAutoModel` and `FlagAutoReranker`, respectively. For more detailed instructions on their use, please refer to the documentation for the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/reranker).
We support fine-tuning a variety of BGE series models, including `bge-large-en-v1.5`, `bge-m3`, `bge-en-icl`, `bge-multilingual-gemma2`, `bge-reranker-v2-m3`, `bge-reranker-v2-gemma`, and `bge-reranker-v2-minicpm-layerwise`, among others. As examples, we use the basic models `bge-large-en-v1.5` and `bge-reranker-large`. For more details, please refer to the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/reranker) sections.
We support evaluations on [MTEB](https://github.com/embeddings-benchmark/mteb), [BEIR](https://github.com/beir-cellar/beir), [MSMARCO](https://microsoft.github.io/msmarco/), [MIRACL](https://github.com/project-miracl/miracl), [MLDR](https://huggingface.co/datasets/Shitao/MLDR), [MKQA](https://github.com/apple/ml-mkqa), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), and custom datasets. Below is an example of evaluating MSMARCO passages. For more details, please refer to the [evaluation examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/evaluation).
Copy file name to clipboardExpand all lines: examples/evaluation/README.md
+29-24Lines changed: 29 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,21 +8,35 @@ This document serves as an overview of the evaluation process and provides a bri
8
8
9
9
In this section, we will first introduce the commonly used arguments across all datasets. Then, we will provide a more detailed explanation of the specific arguments used for each individual dataset.
10
10
11
+
-[1. Introduction](#1-Introduction)
12
+
-[(1) EvalArgs](#1-EvalArgs)
13
+
-[(2) ModelArgs](#2-ModelArgs)
14
+
-[2. Usage](#2-Usage)
15
+
-[Requirements](#Requirements)
16
+
-[(1) MTEB](#1-MTEB)
17
+
-[(2) BEIR](#2-BEIR)
18
+
-[(3) MSMARCO](#3-MSMARCO)
19
+
-[(4) MIRACL](#4-MIRACL)
20
+
-[(5) MLDR](#5-MLDR)
21
+
-[(6) MKQA](#6-MKQA)
22
+
-[(7) AIR-Bench](#7-Air-Bench)
23
+
-[(8) Custom Dataset](#8-Custom-Dataset)
24
+
11
25
## Introduction
12
26
13
27
### 1. EvalArgs
14
28
15
29
**Arguments for evaluation setup:**
16
30
17
31
-**`eval_name`**: Name of the evaluation task (e.g., msmarco, beir, miracl).
18
-
32
+
19
33
-**`dataset_dir`**: Path to the dataset directory. This can be:
20
-
1. A local path to perform evaluation on your dataset (must exist). It should contain:
21
-
-`corpus.jsonl`
22
-
-`<split>_queries.jsonl`
23
-
-`<split>_qrels.jsonl`
24
-
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
25
-
34
+
1. A local path to perform evaluation on your dataset (must exist). It should contain:
35
+
-`corpus.jsonl`
36
+
-`<split>_queries.jsonl`
37
+
-`<split>_qrels.jsonl`
38
+
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
39
+
26
40
-**`force_redownload`**: Set to `True` to force redownload of the dataset. Default is `False`.
27
41
28
42
-**`dataset_names`**: List of dataset names to evaluate or `None` to evaluate all available datasets. This can be the dataset name (BEIR, etc.) or language (MIRACL, etc.).
@@ -107,11 +121,8 @@ Here is an example for evaluation:
107
121
108
122
```shell
109
123
pip install mteb==1.15.0
110
-
```
111
-
112
-
```shell
113
124
python -m FlagEmbedding.evaluation.mteb \
114
-
--eval_name mteb \
125
+
--eval_name mteb \
115
126
--output_dir ./data/mteb/search_results \
116
127
--languages eng \
117
128
--tasks NFCorpus BiorxivClusteringS2S SciDocsRR \
@@ -133,11 +144,8 @@ Here is an example for evaluation:
133
144
pip install beir
134
145
mkdir eval_beir
135
146
cd eavl_beir
136
-
```
137
-
138
-
```shell
139
147
python -m FlagEmbedding.evaluation.beir \
140
-
--eval_name beir \
148
+
--eval_name beir \
141
149
--dataset_dir ./beir/data \
142
150
--dataset_names fiqa arguana cqadupstack \
143
151
--splits test dev \
@@ -168,7 +176,7 @@ Here is an example for evaluation:
168
176
169
177
```shell
170
178
python -m FlagEmbedding.evaluation.msmarco \
171
-
--eval_name msmarco \
179
+
--eval_name msmarco \
172
180
--dataset_dir ./msmarco/data \
173
181
--dataset_names passage \
174
182
--splits dev dl19 dl20 \
@@ -198,7 +206,7 @@ Here is an example for evaluation:
198
206
199
207
```shell
200
208
python -m FlagEmbedding.evaluation.miracl \
201
-
--eval_name miracl \
209
+
--eval_name miracl \
202
210
--dataset_dir ./miracl/data \
203
211
--dataset_names bn hi sw te th yo \
204
212
--splits dev \
@@ -228,7 +236,7 @@ Here is an example for evaluation:
228
236
229
237
```shell
230
238
python -m FlagEmbedding.evaluation.mldr \
231
-
--eval_name mldr \
239
+
--eval_name mldr \
232
240
--dataset_dir ./mldr/data \
233
241
--dataset_names hi \
234
242
--splits test \
@@ -258,7 +266,7 @@ Here is an example for evaluation:
258
266
259
267
```shell
260
268
python -m FlagEmbedding.evaluation.mkqa \
261
-
--eval_name mkqa \
269
+
--eval_name mkqa \
262
270
--dataset_dir ./mkqa/data \
263
271
--dataset_names en zh_cn \
264
272
--splits test \
@@ -293,11 +301,8 @@ Here is an example for evaluation:
293
301
294
302
```shell
295
303
pip install air-benchmark
296
-
```
297
-
298
-
```shell
299
304
python -m FlagEmbedding.evaluation.air_bench \
300
-
--benchmark_version AIR-Bench_24.05 \
305
+
--benchmark_version AIR-Bench_24.05 \
301
306
--task_types qa long-doc \
302
307
--domains arxiv \
303
308
--languages en \
@@ -352,7 +357,7 @@ Please put the above file (`corpus.jsonl`, `test_queries.jsonl`, `test_qrels.jso
0 commit comments