FlagOpen
diff --git a/‎FlagEmbedding/abc/inference/AbsEmbedder.py‎
Lines changed: 0 additions & 2 deletions b/‎FlagEmbedding/abc/inference/AbsEmbedder.py‎
Lines changed: 0 additions & 2 deletions
diff --git a/‎FlagEmbedding/inference/reranker/model_mapping.py‎
Lines changed: 18 additions & 1 deletion b/‎FlagEmbedding/inference/reranker/model_mapping.py‎
Lines changed: 18 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 15 additions & 12 deletions b/‎README.md‎
Lines changed: 15 additions & 12 deletions
diff --git a/‎README_zh.md‎
Lines changed: 14 additions & 12 deletions b/‎README_zh.md‎
Lines changed: 14 additions & 12 deletions
diff --git a/‎examples/finetune/embedder/README.md‎
Lines changed: 5 additions & 3 deletions b/‎examples/finetune/embedder/README.md‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎imgs/BGE_WeChat_Group.png‎
60.9 KB b/‎imgs/BGE_WeChat_Group.png‎
60.9 KB
diff --git a/‎imgs/bge_logo.jpg‎
3.64 MB b/‎imgs/bge_logo.jpg‎
3.64 MB
diff --git a/‎imgs/projects.png‎
113 KB b/‎imgs/projects.png‎
113 KB
@@ -167,8 +167,6 @@ def encode(
             instruction_format: Optional[str] = None,
             **kwargs: Any
     ):
-        if instruction is None: instruction = self.instruction
-        if instruction_format is None: instruction_format = self.instruction_format
         if batch_size is None: batch_size = self.batch_size
         if max_length is None: max_length = self.passage_max_length
         if convert_to_numpy is None: convert_to_numpy = self.convert_to_numpy
 
@@ -54,5 +54,22 @@ class RerankerConfig:
         "bge-reranker-v2.5-gemma2-lightweight",
         RerankerConfig(LightWeightFlagLLMReranker)
     ),
-    # TODO: Add more models, such as Jina, e5, etc.
+    # others
+    (
+        "jinaai/jina-reranker-v2-base-multilingual",
+        RerankerConfig(FlagReranker)
+    ),
+    (
+        "Alibaba-NLP/gte-multilingual-reranker-base",
+        RerankerConfig(FlagReranker)
+    ),
+    (
+        "maidalun1020/bce-reranker-base_v1",
+        RerankerConfig(FlagReranker)
+    ),
+    (
+        "jinaai/jina-reranker-v1-turbo-en",
+        RerankerConfig(FlagReranker)
+    ),
+    # TODO: Add more models.
 ])
@@ -1,4 +1,6 @@
-<h1 align="center">FlagEmbedding</h1>
+![bge_logo](./imgs/bge_logo.jpg)
+
+<h1 align="center">⚡️BGE: One-Stop Retrieval Toolkit For Search and RAG</h1>
 <p align="center">
     <a href="https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d">
         <img alt="Build" src="https://img.shields.io/badge/BGE_series-🤗-yellow">
@@ -12,7 +14,7 @@
     <a href="https://huggingface.co/C-MTEB">
         <img alt="Build" src="https://img.shields.io/badge/C_MTEB-🤗-yellow">
     </a>
-    <a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding">
+    <a href="https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/baai_general_embedding">
         <img alt="Build" src="https://img.shields.io/badge/FlagEmbedding-1.1-red">
     </a>
 </p>
@@ -30,25 +32,26 @@
     <p>
 </h4>
 
+[English](README.md) | [中文](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/README_zh.md)
 
 
-[English](README.md) | [中文](https://github.com/hanhainebula/FlagEmbedding/blob/new-flagembedding-v1/README_zh.md)
 
-FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently:
+BGE (BAAI General Embedding) focuses on retrieval-augmented LLMs, consisting of the following projects currently:
+
+![projects](./imgs/projects.png)
 
 - **Inference**: [Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/inference/embedder), [Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/inference/reranker)
 - **Finetune**: [Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/embedder), [Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/reranker)
-- **Evaluation**: [MTEB](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#1-mteb), [BEIR](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#2-beir), [MSMARCO](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#3-msmarco), [MIRACL](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#4-miracl), [MLDR](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#5-mldr), [MKQA](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#6-mkqa), [AIR-Bench](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#7-air-bench), [Custom Dataset](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#8-custom-dataset)
-- **[Dataset](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/dataset)**: [MLDR](https://huggingface.co/datasets/Shitao/MLDR), [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data), [public-data](https://huggingface.co/datasets/cfli/bge-e5data), [full-data](https://huggingface.co/datasets/cfli/bge-full-data), [reranker-data](Shitao/bge-reranker-data)
+- **[Evaluation](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation)**
+- **[Dataset](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/dataset)**
 - **[Tutorials](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/Tutorials)**
-- **research**:
-  - **Long-Context LLM**: [Activation Beacon](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/Long_LLM/activation_beacon), [LongLLM QLoRA](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/Long_LLM/longllm_qlora)
-  - **Fine-tuning of LM** : [LM-Cocktail](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/LM_Cocktail)
-  - **Embedding Model**: [Visualized-BGE](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/visual_bge), [BGE-M3](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/BGE_M3), [LLM Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/llm_embedder), [BGE Embedding](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/baai_general_embedding)
-  - **Reranker Model**: [llm rerankers](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/llm_reranker), [BGE Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/reranker)
-  - **Benchmark**: [C-MTEB](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/C_MTEB), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), [MLVU](https://github.com/JUNJIE99/MLVU)
+- **[research](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research)**
 
 ## News
+
+- 29/10/2024: :earth_asia: We created WeChat group for BGE. Scan the [QR code](./imgs/BGE_WeChat_Group.png) to join the group chat! To get the first hand message about our updates and new release, or having any questions or ideas, join us now!
+- <img src="./imgs/BGE_WeChat_Group.png" alt="bge_wechat_group" class="center" width="200">
+
 - 22/10/2024:  :fire: We release another interesting model: [OmniGen](https://github.com/VectorSpaceLab/OmniGen), which is a unified image generation model supporting various tasks. OmniGen can accomplish complex image generation tasks without the need for additional plugins like ControlNet, IP-Adapter, or auxiliary models such as pose detection and face detection.
 - 9/10/2024: Introducing **MemoRAG**, a step forward towards RAG 2.0 on top of memory-inspired knowledge discovery (repo: https://github.com/qhjqhj00/MemoRAG, paper: https://arxiv.org/pdf/2409.05591v1) :fire:
 - 9/2/2024: Start to maintain the [tutorials](./Tutorials/). The contents within will be actively updated and eariched, stay tuned! :books:
 
@@ -1,4 +1,6 @@
-<h1 align="center">FlagEmbedding</h1>
+![bge_logo](./imgs/bge_logo.jpg)
+
+<h1 align="center">⚡️BGE: One-Stop Retrieval Toolkit For Search and RAG</h1>
 <p align="center">
     <a href="https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d">
         <img alt="Build" src="https://img.shields.io/badge/BGE_series-🤗-yellow">
@@ -12,11 +14,12 @@
     <a href="https://huggingface.co/C-MTEB">
         <img alt="Build" src="https://img.shields.io/badge/C_MTEB-🤗-yellow">
     </a>
-    <a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding">
+    <a href="https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/baai_general_embedding">
         <img alt="Build" src="https://img.shields.io/badge/FlagEmbedding-1.1-red">
     </a>
 </p>
 
+
 <h4 align="center">
     <p>
         <a href=#更新>更新</a> |
@@ -30,25 +33,24 @@
         <a href="#license">License</a> 
     <p>
 </h4>
-
 [English](README.md) | [中文](README_zh.md)
 
+BGE (BAAI General Embedding) 专注于检索增强llm领域，目前包括以下项目:
 
-FlagEmbedding专注于检索增强llm领域，目前包括以下项目:
+![projects](./imgs/projects.png)
 
 - **推理**: [Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/inference/embedder), [Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/inference/reranker)
 - **微调**: [Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/embedder), [Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/finetune/reranker)
-- **评估**: [MTEB](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#1-mteb), [BEIR](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#2-beir), [MSMARCO](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#3-msmarco), [MIRACL](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#4-miracl), [MLDR](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#5-mldr), [MKQA](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#6-mkqa), [AIR-Bench](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#7-air-bench), [Custom Dataset](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation#8-custom-dataset)
-- **[数据集](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/dataset)**: [MLDR](https://huggingface.co/datasets/Shitao/MLDR), [bge-m3-data](https://huggingface.co/datasets/Shitao/bge-m3-data), [public-data](https://huggingface.co/datasets/cfli/bge-e5data), [full-data](https://huggingface.co/datasets/cfli/bge-full-data), [reranker-data](Shitao/bge-reranker-data)
+- **[评估](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/examples/evaluation)**
+- **[数据集](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/dataset)**
 - **[教程](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/Tutorials)**
-- **研究**:
-  - **Long-Context LLM**: [Activation Beacon](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/Long_LLM/activation_beacon), [LongLLM QLoRA](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/Long_LLM/longllm_qlora)
-  - **Fine-tuning of LM** : [LM-Cocktail](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/LM_Cocktail)
-  - **Embedding Model**: [Visualized-BGE](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/visual_bge), [BGE-M3](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/BGE_M3), [LLM Embedder](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/llm_embedder), [BGE Embedding](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/baai_general_embedding)
-  - **Reranker Model**: [llm rerankers](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/llm_reranker), [BGE Reranker](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/reranker)
-  - **Benchmark**: [C-MTEB](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research/C_MTEB), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), [MLVU](https://github.com/JUNJIE99/MLVU)
+- **[研究](https://github.com/hanhainebula/FlagEmbedding/tree/new-flagembedding-v1/research)**
 
 ## 更新
+
+- 29/10/2024: :earth_asia: 我们建立了[BGE技术交流群](./BGE_WeChat_Group.png)，欢迎扫码入群！
+- <img src="./imgs/BGE_WeChat_Group.png" alt="bge_wechat_group" class="center" width="200">
+
 - 9/2/2024: 开始维护更新[教程](./Tutorials/)，教程文件夹中的内容会在未来不断丰富，欢迎持续关注！ :books:
 - 7/26/2024：发布[bge-en-icl](https://huggingface.co/BAAI/bge-en-icl)。这是一个结合了上下文学习能力的文本检索模型，通过提供与任务相关的查询-回答示例，可以编码语义更丰富的查询，进一步增强嵌入的语义表征能力。 :fire:
 - 7/26/2024: 发布[bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2)。这是一个基于gemma-2-9b的多语言文本向量模型，同时支持多种语言和多样的下游任务，在多语言检索数据集 MIRACL, MTEB-fr, MTEB-pl 上取得了迄今最好的实验结果。 :fire:
 
@@ -75,9 +75,11 @@ cd FlagEmbedding/scripts
 python add_reranker_score.py \
 --input_file toy_finetune_data_minedHN.jsonl \
 --output_file toy_finetune_data_score.jsonl \
---range_for_sampling 2-200 \
---negative_number 15 \
---use_gpu_for_searching 
+--reranker_name_or_path BAAI/bge-reranker-v2-m3 \
+--devices cuda:0 cuda:1 \
+--cache_dir ./cache/model \
+--reranker_query_max_length 512 \
+--reranker_max_length 1024
 ```
 
 - **`input_file`**: path to save JSON data with mined hard negatives for finetuning