Skip to content

Commit c6ba4b2

Browse files
authored
Merge pull request #1175 from 545999961/master
update readme index
2 parents 456899a + fc78c70 commit c6ba4b2

4 files changed

Lines changed: 69 additions & 36 deletions

File tree

examples/README.md

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,42 @@
1-
# 1. Introduction
1+
# Examples
2+
3+
- [1. Introduction](#1-Introduction)
4+
- [2. Installation](#2-Installation)
5+
- [3. Inference](#3-Inference)
6+
- [4. Finetune](#4-Finetune)
7+
- [5. Evaluation](#5-Evaluation)
8+
9+
## 1. Introduction
210

311
In this example, we show how to **inference**, **finetune** and **evaluate** the baai-general-embedding.
412

5-
# 2. Installation
13+
## 2. Installation
614

715
* **with pip**
16+
817
```shell
918
pip install -U FlagEmbedding
1019
```
1120

1221
* **from source**
22+
1323
```shell
1424
git clone https://github.com/FlagOpen/FlagEmbedding.git
1525
cd FlagEmbedding
1626
pip install .
1727
```
28+
1829
For development, install as editable:
30+
1931
```shell
2032
pip install -e .
2133
```
2234

23-
# 3. Inference
35+
## 3. Inference
2436

2537
We have provided the inference code for two types of models: the **embedder** and the **reranker**. These can be loaded using `FlagAutoModel` and `FlagAutoReranker`, respectively. For more detailed instructions on their use, please refer to the documentation for the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/inference/reranker).
2638

27-
## 1. Embedder
39+
### 1. Embedder
2840

2941
```python
3042
from FlagEmbedding import FlagAutoModel
@@ -49,7 +61,7 @@ scores = q_embeddings @ p_embeddings.T
4961
print(scores)
5062
```
5163

52-
## 2. Reranker
64+
### 2. Reranker
5365

5466
```python
5567
from FlagEmbedding import FlagAutoReranker
@@ -65,7 +77,7 @@ scores = model.compute_score(pairs)
6577
print(scores)
6678
```
6779

68-
# 4. Finetune
80+
## 4. Finetune
6981

7082
We support fine-tuning a variety of BGE series models, including `bge-large-en-v1.5`, `bge-m3`, `bge-en-icl`, `bge-multilingual-gemma2`, `bge-reranker-v2-m3`, `bge-reranker-v2-gemma`, and `bge-reranker-v2-minicpm-layerwise`, among others. As examples, we use the basic models `bge-large-en-v1.5` and `bge-reranker-large`. For more details, please refer to the [embedder](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder) and [reranker](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/reranker) sections.
7183

@@ -74,7 +86,7 @@ pip install deepspeed
7486
pip install flash-attn --no-build-isolation
7587
```
7688

77-
## 1. Embedder
89+
### 1. Embedder
7890

7991
```shell
8092
torchrun --nproc_per_node 2 \
@@ -109,7 +121,7 @@ torchrun --nproc_per_node 2 \
109121
--kd_loss_type kl_div
110122
```
111123

112-
## 2. Reranker
124+
### 2. Reranker
113125

114126
```shell
115127
torchrun --nproc_per_node 2 \
@@ -139,16 +151,13 @@ torchrun --nproc_per_node 2 \
139151
--save_steps 1000
140152
```
141153

142-
# 5. Evaluation
154+
## 5. Evaluation
143155

144156
We support evaluations on [MTEB](https://github.com/embeddings-benchmark/mteb), [BEIR](https://github.com/beir-cellar/beir), [MSMARCO](https://microsoft.github.io/msmarco/), [MIRACL](https://github.com/project-miracl/miracl), [MLDR](https://huggingface.co/datasets/Shitao/MLDR), [MKQA](https://github.com/apple/ml-mkqa), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), and custom datasets. Below is an example of evaluating MSMARCO passages. For more details, please refer to the [evaluation examples](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/evaluation).
145157

146158
```shell
147159
pip install pytrec_eval
148160
pip install https://github.com/kyamagu/faiss-wheels/releases/download/v1.7.3/faiss_gpu-1.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
149-
```
150-
151-
```shell
152161
python -m FlagEmbedding.evaluation.msmarco \
153162
--eval_name msmarco \
154163
--dataset_dir ./data/msmarco \

examples/evaluation/README.md

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,35 @@ This document serves as an overview of the evaluation process and provides a bri
88

99
In this section, we will first introduce the commonly used arguments across all datasets. Then, we will provide a more detailed explanation of the specific arguments used for each individual dataset.
1010

11+
- [1. Introduction](#1-Introduction)
12+
- [(1) EvalArgs](#1-EvalArgs)
13+
- [(2) ModelArgs](#2-ModelArgs)
14+
- [2. Usage](#2-Usage)
15+
- [Requirements](#Requirements)
16+
- [(1) MTEB](#1-MTEB)
17+
- [(2) BEIR](#2-BEIR)
18+
- [(3) MSMARCO](#3-MSMARCO)
19+
- [(4) MIRACL](#4-MIRACL)
20+
- [(5) MLDR](#5-MLDR)
21+
- [(6) MKQA](#6-MKQA)
22+
- [(7) AIR-Bench](#7-Air-Bench)
23+
- [(8) Custom Dataset](#8-Custom-Dataset)
24+
1125
## Introduction
1226

1327
### 1. EvalArgs
1428

1529
**Arguments for evaluation setup:**
1630

1731
- **`eval_name`**: Name of the evaluation task (e.g., msmarco, beir, miracl).
18-
32+
1933
- **`dataset_dir`**: Path to the dataset directory. This can be:
20-
1. A local path to perform evaluation on your dataset (must exist). It should contain:
21-
- `corpus.jsonl`
22-
- `<split>_queries.jsonl`
23-
- `<split>_qrels.jsonl`
24-
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
25-
34+
1. A local path to perform evaluation on your dataset (must exist). It should contain:
35+
- `corpus.jsonl`
36+
- `<split>_queries.jsonl`
37+
- `<split>_qrels.jsonl`
38+
2. Path to store datasets downloaded via API. Provide `None` to use the cache directory.
39+
2640
- **`force_redownload`**: Set to `True` to force redownload of the dataset. Default is `False`.
2741

2842
- **`dataset_names`**: List of dataset names to evaluate or `None` to evaluate all available datasets. This can be the dataset name (BEIR, etc.) or language (MIRACL, etc.).
@@ -107,11 +121,8 @@ Here is an example for evaluation:
107121

108122
```shell
109123
pip install mteb==1.15.0
110-
```
111-
112-
```shell
113124
python -m FlagEmbedding.evaluation.mteb \
114-
--eval_name mteb \
125+
--eval_name mteb \
115126
--output_dir ./data/mteb/search_results \
116127
--languages eng \
117128
--tasks NFCorpus BiorxivClusteringS2S SciDocsRR \
@@ -133,11 +144,8 @@ Here is an example for evaluation:
133144
pip install beir
134145
mkdir eval_beir
135146
cd eavl_beir
136-
```
137-
138-
```shell
139147
python -m FlagEmbedding.evaluation.beir \
140-
--eval_name beir \
148+
--eval_name beir \
141149
--dataset_dir ./beir/data \
142150
--dataset_names fiqa arguana cqadupstack \
143151
--splits test dev \
@@ -168,7 +176,7 @@ Here is an example for evaluation:
168176

169177
```shell
170178
python -m FlagEmbedding.evaluation.msmarco \
171-
--eval_name msmarco \
179+
--eval_name msmarco \
172180
--dataset_dir ./msmarco/data \
173181
--dataset_names passage \
174182
--splits dev dl19 dl20 \
@@ -198,7 +206,7 @@ Here is an example for evaluation:
198206

199207
```shell
200208
python -m FlagEmbedding.evaluation.miracl \
201-
--eval_name miracl \
209+
--eval_name miracl \
202210
--dataset_dir ./miracl/data \
203211
--dataset_names bn hi sw te th yo \
204212
--splits dev \
@@ -228,7 +236,7 @@ Here is an example for evaluation:
228236

229237
```shell
230238
python -m FlagEmbedding.evaluation.mldr \
231-
--eval_name mldr \
239+
--eval_name mldr \
232240
--dataset_dir ./mldr/data \
233241
--dataset_names hi \
234242
--splits test \
@@ -258,7 +266,7 @@ Here is an example for evaluation:
258266

259267
```shell
260268
python -m FlagEmbedding.evaluation.mkqa \
261-
--eval_name mkqa \
269+
--eval_name mkqa \
262270
--dataset_dir ./mkqa/data \
263271
--dataset_names en zh_cn \
264272
--splits test \
@@ -293,11 +301,8 @@ Here is an example for evaluation:
293301

294302
```shell
295303
pip install air-benchmark
296-
```
297-
298-
```shell
299304
python -m FlagEmbedding.evaluation.air_bench \
300-
--benchmark_version AIR-Bench_24.05 \
305+
--benchmark_version AIR-Bench_24.05 \
301306
--task_types qa long-doc \
302307
--domains arxiv \
303308
--languages en \
@@ -352,7 +357,7 @@ Please put the above file (`corpus.jsonl`, `test_queries.jsonl`, `test_qrels.jso
352357

353358
```shell
354359
python -m FlagEmbedding.evaluation.custom \
355-
--eval_name your_data_name \
360+
--eval_name your_data_name \
356361
--dataset_dir ./your_data_path \
357362
--splits test \
358363
--corpus_embd_save_dir ./your_data_name/corpus_embd \

examples/finetune/embedder/README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,16 @@
22

33
In this example, we show how to finetune the embedder with your data.
44

5+
- [1. Installation](#1-Installation)
6+
- [2. Data format](#2-Data-format)
7+
- [Hard Negatives](#Hard-Negatives)
8+
- [Teacher Scores](#Teacher-Scores)
9+
- [3. Train](#3-Train)
10+
- [(1) standard model](#1-standard-model)
11+
- [(2) bge-m3](#2-bge-m3)
12+
- [(3) bge-multilingual-gemma2](#3-bge-multilingual-gemma2)
13+
- [(4) bge-en-icl](#4-bge-en-icl)
14+
515
## 1. Installation
616

717
- **with pip**

examples/finetune/reranker/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
In this example, we show how to finetune the reranker with your data.
44

5+
- [1. Installation](#1-Installation)
6+
- [2. Data format](#2-Data-format)
7+
- [Hard Negatives](#Hard-Negatives)
8+
- [Teacher Scores](#Teacher-Scores)
9+
- [3. Train](#3-Train)
10+
- [(1) standard model](#1-standard-model)
11+
- [(2) bge-reranker-v2-gemma](#2-bge-reranker-v2-gemma)
12+
- [(3) bge-reranker-v2-layerwise-minicpm](#3-bge-reranker-v2-layerwise-minicpm)
13+
514
## 1. Installation
615

716
- **with pip**

0 commit comments

Comments
 (0)