Skip to content

Commit 24f9aa9

Browse files
authored
Merge pull request #1000 from hanhainebula/master
release new models
2 parents ab7e48b + 4bbb986 commit 24f9aa9

3 files changed

Lines changed: 8 additions & 6 deletions

File tree

FlagEmbedding/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
from .flag_models import FlagModel, LLMEmbedder, FlagICLModel
1+
from .flag_models import FlagModel, LLMEmbedder, FlagICLModel, FlagLLMModel
22
from .bge_m3 import BGEM3FlagModel
33
from .flag_reranker import FlagReranker, FlagLLMReranker, LayerWiseFlagLLMReranker

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,9 @@ FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following p
3737
- **Benchmark**: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), [MLVU](https://github.com/JUNJIE99/MLVU)
3838

3939
## News
40-
- 7/26/2024: Release a new embedding model [bge-en-icl](BAAI/bge-en-icl), an embedding model that incorporates in-context learning capabilities, which, by providing task-relevant query-response examples, can encode semantically richer queries, further enhancing the semantic representation ability of the embeddings.
41-
- 7/26/2024: Release a new lightweight reranker [bge-reranker-v2.5-gemma2-lightweight](BAAI/bge-reranker-v2.5-gemma2-lightweight), a lightweight reranker based on gemma2-9B, which supports token compression and layerwise lightweight operations, can still ensure good performance while saving a significant amount of resources.
40+
- 7/26/2024: Release a new embedding model [bge-en-icl](https://huggingface.co/BAAI/bge-en-icl), an embedding model that incorporates in-context learning capabilities, which, by providing task-relevant query-response examples, can encode semantically richer queries, further enhancing the semantic representation ability of the embeddings. :fire:
41+
- 7/26/2024: Release a new embedding model [bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2), a multilingual embedding model based on gemma-2-9b, which supports multiple languages and diverse downstream tasks, achieving new SOTA on multilingual benchmarks (MIRACL, MTEB-fr, and MTEB-pl). :fire:
42+
- 7/26/2024: Release a new lightweight reranker [bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight), a lightweight reranker based on gemma-2-9b, which supports token compression and layerwise lightweight operations, can still ensure good performance while saving a significant amount of resources. :fire:
4243
- 6/7/2024: Release a new benchmark [MLVU](https://github.com/JUNJIE99/MLVU), the first comprehensive benchmark specifically designed for long video understanding. MLVU features an extensive range of video durations, a diverse collection of video sources, and a set of evaluation tasks uniquely tailored for long-form video understanding. :fire:
4344
- 5/21/2024: Release a new benchmark [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench) together with Jina AI, Zilliz, HuggingFace, and other partners. AIR-Bench focuses on a fair out-of-distribution evaluation for Neural IR & RAG. It generates the synthetic data for benchmarking w.r.t. diverse domains and languages. It is dynamic and will be updated on regular basis. [Leaderboard](https://huggingface.co/spaces/AIR-Bench/leaderboard) :fire:
4445
- 4/30/2024: Release [Llama-3-8B-Instruct-80K-QLoRA](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA), extending the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA training on a few synthesized long-context data. The model achieves remarkable performance on various long-context benchmarks. [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora) :fire:

README_zh.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,10 @@ FlagEmbedding专注于检索增强llm领域,目前包括以下项目:
3636
- **Benchmark**: [C-MTEB](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB), [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench), [MLVU](https://github.com/JUNJIE99/MLVU)
3737

3838
## 更新
39-
- 7/26/2024:发布bge-en-icl](https://chat.xiaoai.plus/BAAI/bge-en-icl)。这是一个结合了上下文学习能力的文本检索模型,通过提供与任务相关的查询-回答示例,可以编码语义更丰富的查询,进一步增强嵌入的语义表征能力。🔥
40-
- 7/26/2024:发布新的轻量级重排器[bge-reranker-v2.5-gemma2-lightweight](https://chat.xiaoai.plus/BAAI/bge-reranker-v2.5-gemma2-lightweight)。这是一个基于gemma2-9B的轻量级重排器,支持令牌压缩和分层轻量操作,在节省大量资源的同时,仍能确保良好的性能。🔥
41-
- 6/7/2024: 发布首个专为长视频理解设计的全面评测基准[MLVU](https://github.com/JUNJIE99/MLVU)。MLVU拥有丰富的视频时长范围,多样化的视频来源,以及多个专为长视频理解设计的评估任务。🔥
39+
- 7/26/2024:发布[bge-en-icl](https://huggingface.co/BAAI/bge-en-icl)。这是一个结合了上下文学习能力的文本检索模型,通过提供与任务相关的查询-回答示例,可以编码语义更丰富的查询,进一步增强嵌入的语义表征能力。:fire:
40+
- 7/26/2024: 发布[bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2)。这是一个基于gemma-2-9b的多语言文本向量模型,同时支持多种语言和多样的下游任务,在多语言检索数据集 MIRACL, MTEB-fr, MTEB-pl 上取得了迄今最好的实验结果。:fire:
41+
- 7/26/2024:发布新的轻量级重排器[bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight)。这是一个基于gemma-2-9b的轻量级重排器,支持令牌压缩和分层轻量操作,在节省大量资源的同时,仍能确保良好的性能。:fire:
42+
- 6/7/2024: 发布首个专为长视频理解设计的全面评测基准[MLVU](https://github.com/JUNJIE99/MLVU)。MLVU拥有丰富的视频时长范围,多样化的视频来源,以及多个专为长视频理解设计的评估任务。:fire:
4243
- 5/21/2024:联合 Jina AI、Zilliz、HuggingFace 等机构发布评测基准 [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench),针对检索任务和 RAG 场景设计。AIR-Bench 首次提出在检索任务中使用 LLMs 自动化生产评估数据,避免模型过拟合测试数据。AIR-Bench 不需要人工参与标注数据,因而可以更灵活覆盖更多垂直领域和不同语种。同时 AIR-Bench 会定期进行更新从而满足社区不断变化的评测需求。[Leaderboard](https://huggingface.co/spaces/AIR-Bench/leaderboard) :fire:
4344
- 4/30/2024: 发布[Llama-3-8B-Instruct-80K-QLoRA](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA), 其通过在少量合成的长文本数据上的QLoRA训练,有效地将Llama-3-8B-Instruct的上下文长度从8K扩展到80K。详见[代码](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora) :fire:
4445
- 3/18/2024: 发布新的[rerankers](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker), 拥有更好的性能同时支持多语言和长文本。 :fire:

0 commit comments

Comments
 (0)