Skip to content

Commit b87f966

Browse files
committed
update results of BM25
1 parent 23407d5 commit b87f966

3 files changed

Lines changed: 7 additions & 6 deletions

File tree

FlagEmbedding/BGE_M3/BGE_M3.pdf

2.79 KB
Binary file not shown.

FlagEmbedding/BGE_M3/README.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -200,12 +200,7 @@ print(model.compute_score(sentence_pairs,
200200
## Evaluation
201201

202202
We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
203-
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
204-
To make the BM25 and BGE-M3 more comparable, in the experiment,
205-
BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
206-
Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
207-
Results of BM25 using other tokenizer can be found in [here](https://github.com/carlos-lassance/bm25_mldr)
208-
(Thanks to carlos-lassance for providing the results).
203+
209204

210205
- Multilingual (Miracl dataset)
211206

@@ -228,6 +223,12 @@ Results of BM25 using other tokenizer can be found in [here](https://github.com/
228223
- NarritiveQA:
229224
![avatar](./imgs/nqa.jpg)
230225

226+
- BM25
227+
228+
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
229+
230+
![avatar](./imgs/bm25.jpg)
231+
231232
## Training
232233
- Self-knowledge Distillation: combining multiple outputs from different
233234
retrieval modes as reward signal to enhance the performance of single mode(especially for sparse retrieval and multi-vec(colbert) retrival)

FlagEmbedding/BGE_M3/imgs/bm25.jpg

67.3 KB
Loading

0 commit comments

Comments
 (0)