FlagOpen
diff --git a/‎FlagEmbedding/BGE_M3/BGE_M3.pdf‎
2.79 KB b/‎FlagEmbedding/BGE_M3/BGE_M3.pdf‎
2.79 KB
diff --git a/‎FlagEmbedding/BGE_M3/README.md‎
Lines changed: 7 additions & 6 deletions b/‎FlagEmbedding/BGE_M3/README.md‎
Lines changed: 7 additions & 6 deletions
diff --git a/‎FlagEmbedding/BGE_M3/imgs/bm25.jpg‎
67.3 KB b/‎FlagEmbedding/BGE_M3/imgs/bm25.jpg‎
67.3 KB
@@ -200,12 +200,7 @@ print(model.compute_score(sentence_pairs,
 ## Evaluation  
 
 We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
-We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline). 
-To make the BM25 and BGE-M3 more comparable, in the experiment, 
-BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta). 
-Using the same vocabulary can also ensure that both approaches have the same retrieval latency. 
-Results of BM25 using other tokenizer can be found in [here](https://github.com/carlos-lassance/bm25_mldr) 
-(Thanks to carlos-lassance for providing the results).
+
 
 - Multilingual (Miracl dataset) 
 
@@ -228,6 +223,12 @@ Results of BM25 using other tokenizer can be found in [here](https://github.com/
   - NarritiveQA:  
   ![avatar](./imgs/nqa.jpg)
 
+- BM25  
+
+We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
+
+![avatar](./imgs/bm25.jpg)
+
 ## Training
 - Self-knowledge Distillation: combining multiple outputs from different 
 retrieval modes as reward signal to enhance the performance of single mode(especially for sparse retrieval and multi-vec(colbert) retrival)