Skip to content

Commit e22aaad

Browse files
committed
update README and README_zh
1 parent c5d73cd commit e22aaad

2 files changed

Lines changed: 63 additions & 12 deletions

File tree

README.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,6 @@ FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following p
4646
- 7/26/2024: Release a new embedding model [bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2), a multilingual embedding model based on gemma-2-9b, which supports multiple languages and diverse downstream tasks, achieving new SOTA on multilingual benchmarks (MIRACL, MTEB-fr, and MTEB-pl). :fire:
4747
- 7/26/2024: Release a new lightweight reranker [bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight), a lightweight reranker based on gemma-2-9b, which supports token compression and layerwise lightweight operations, can still ensure good performance while saving a significant amount of resources. :fire:
4848

49-
50-
5149
<details>
5250
<summary>More</summary>
5351
<!-- ### More -->
@@ -79,12 +77,13 @@ It is the first embedding model which supports all three retrieval methods, achi
7977
</details>
8078

8179
## Installation
82-
- Using pip:
80+
### Using pip:
8381
```
8482
pip install -U FlagEmbedding
8583
```
86-
- Install from sources:
87-
Clone the repository
84+
### Install from sources:
85+
86+
Clone the repository and install
8887
```
8988
git clone https://github.com/FlagOpen/FlagEmbedding.git
9089
cd FlagEmbedding
@@ -111,7 +110,7 @@ sentences_2 = ["I love BGE", "I love text retrieval"]
111110
embeddings_1 = model.encode(sentences_1)
112111
embeddings_2 = model.encode(sentences_2)
113112
```
114-
Once we get the embeddings, we can compute similarity.
113+
Once we get the embeddings, we can compute similarity by inner product:
115114
```
116115
similarity = embeddings_1 @ embeddings_2.T
117116
print(similarity)

README_zh.md

Lines changed: 58 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,30 @@
11
<h1 align="center">FlagEmbedding</h1>
22
<p align="center">
3-
<a href="https://www.python.org/">
4-
<img alt="Build" src="https://img.shields.io/badge/Made with-Python-purple">
3+
<a href="https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d">
4+
<img alt="Build" src="https://img.shields.io/badge/BGE_series-🤗-yellow">
5+
</a>
6+
<a href="https://github.com/FlagOpen/FlagEmbedding">
7+
<img alt="Build" src="https://img.shields.io/badge/Contribution-Welcome-blue">
58
</a>
69
<a href="https://github.com/FlagOpen/FlagEmbedding/blob/master/LICENSE">
710
<img alt="License" src="https://img.shields.io/badge/LICENSE-MIT-green">
811
</a>
912
<a href="https://huggingface.co/C-MTEB">
10-
<img alt="License" src="https://img.shields.io/badge/C_MTEB-🤗-yellow">
13+
<img alt="Build" src="https://img.shields.io/badge/C_MTEB-🤗-yellow">
1114
</a>
12-
<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding">
13-
<img alt="License" src="https://img.shields.io/badge/universal embedding-1.1-red">
15+
<a href="https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/baai_general_embedding">
16+
<img alt="Build" src="https://img.shields.io/badge/FlagEmbedding-1.1-red">
1417
</a>
1518
</p>
1619

1720
<h4 align="center">
1821
<p>
1922
<a href=#更新>更新</a> |
23+
<a href=#安装>安装</a> |
24+
<a href=#快速开始>快速开始</a> |
2025
<a href="#项目">项目</a> |
2126
<a href="#模型列表">模型列表</a> |
27+
<a href=#贡献者>贡献者</a> |
2228
<a href="#citation">Citation</a> |
2329
<a href="#license">License</a>
2430
<p>
@@ -39,6 +45,10 @@ FlagEmbedding专注于检索增强llm领域,目前包括以下项目:
3945
- 7/26/2024:发布[bge-en-icl](https://huggingface.co/BAAI/bge-en-icl)。这是一个结合了上下文学习能力的文本检索模型,通过提供与任务相关的查询-回答示例,可以编码语义更丰富的查询,进一步增强嵌入的语义表征能力。 :fire:
4046
- 7/26/2024: 发布[bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2)。这是一个基于gemma-2-9b的多语言文本向量模型,同时支持多种语言和多样的下游任务,在多语言检索数据集 MIRACL, MTEB-fr, MTEB-pl 上取得了迄今最好的实验结果。 :fire:
4147
- 7/26/2024:发布新的轻量级重排器[bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight)。这是一个基于gemma-2-9b的轻量级重排器,支持令牌压缩和分层轻量操作,在节省大量资源的同时,仍能确保良好的性能。:fire:
48+
49+
<details>
50+
<summary>More</summary>
51+
4252
- 6/7/2024: 发布首个专为长视频理解设计的全面评测基准[MLVU](https://github.com/JUNJIE99/MLVU)。MLVU拥有丰富的视频时长范围,多样化的视频来源,以及多个专为长视频理解设计的评估任务。 :fire:
4353
- 5/21/2024:联合 Jina AI、Zilliz、HuggingFace 等机构发布评测基准 [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench),针对检索任务和 RAG 场景设计。AIR-Bench 首次提出在检索任务中使用 LLMs 自动化生产评估数据,避免模型过拟合测试数据。AIR-Bench 不需要人工参与标注数据,因而可以更灵活覆盖更多垂直领域和不同语种。同时 AIR-Bench 会定期进行更新从而满足社区不断变化的评测需求。[Leaderboard](https://huggingface.co/spaces/AIR-Bench/leaderboard) :fire:
4454
- 4/30/2024: 发布[Llama-3-8B-Instruct-80K-QLoRA](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA), 其通过在少量合成的长文本数据上的QLoRA训练,有效地将Llama-3-8B-Instruct的上下文长度从8K扩展到80K。详见[代码](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora) :fire:
@@ -59,8 +69,48 @@ FlagEmbedding专注于检索增强llm领域,目前包括以下项目:
5969
- 08/02/2023: :tada: :tada: 发布中英文向量模型BGE(BAAI General Embedding的缩写), **在MTEB和C-MTEB榜单上取得最好的性能**
6070
- 08/01/2023: 发布大规模中文文本向量[评测榜单](https://github.com/FlagOpen/FlagEmbedding/blob/master/C_MTEB) (**C-MTEB**), 其包括31个测试任务.
6171

72+
</details>
6273

6374

75+
## 安装
76+
### 使用pip:
77+
```
78+
pip install -U FlagEmbedding
79+
```
80+
### 从源文件安装部署:
81+
82+
克隆并安装FlagEmbedding:
83+
```
84+
git clone https://github.com/FlagOpen/FlagEmbedding.git
85+
cd FlagEmbedding
86+
pip install .
87+
```
88+
在可编辑模式下安装:
89+
```
90+
pip install -e .
91+
```
92+
93+
## 快速开始
94+
首先,加载一个BGE向量模型:
95+
```
96+
from FlagEmbedding import FlagModel
97+
98+
model = FlagModel('BAAI/bge-base-en-v1.5',
99+
query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
100+
use_fp16=True)
101+
```
102+
将语句作为模型输入,得到向量:
103+
```
104+
sentences_1 = ["I love NLP", "I love machine learning"]
105+
sentences_2 = ["I love BGE", "I love text retrieval"]
106+
embeddings_1 = model.encode(sentences_1)
107+
embeddings_2 = model.encode(sentences_2)
108+
```
109+
取得向量后,通过内积计算相似度:
110+
```
111+
similarity = embeddings_1 @ embeddings_2.T
112+
print(similarity)
113+
```
64114

65115

66116
## 项目
@@ -170,7 +220,9 @@ BGE Embedding是一个通用向量模型。 我们使用[retromae](https://githu
170220

171221

172222

173-
## Contributors:
223+
## 贡献者:
224+
225+
十分感谢所有参与FlagEmbedding社区成员的贡献,也欢迎新的成员加入!
174226

175227
<a href="https://github.com/FlagOpen/FlagEmbedding/graphs/contributors">
176228
<img src="https://contrib.rocks/image?repo=FlagOpen/FlagEmbedding" />

0 commit comments

Comments
 (0)