Skip to content

Commit e45838b

Browse files
committed
reform README
1 parent 06a2113 commit e45838b

1 file changed

Lines changed: 46 additions & 6 deletions

File tree

README.md

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
<h4 align="center">
1818
<p>
1919
<a href=#news>News</a> |
20+
<a href=#installation>Installation</a> |
21+
<a href=#quick-start>Quick Start</a> |
2022
<a href="#projects">Projects</a> |
2123
<a href=#model-list>Model List</a> |
2224
<a href="#contributor">Contributor</a> |
@@ -40,6 +42,13 @@ FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following p
4042
- 7/26/2024: Release a new embedding model [bge-en-icl](https://huggingface.co/BAAI/bge-en-icl), an embedding model that incorporates in-context learning capabilities, which, by providing task-relevant query-response examples, can encode semantically richer queries, further enhancing the semantic representation ability of the embeddings. :fire:
4143
- 7/26/2024: Release a new embedding model [bge-multilingual-gemma2](https://huggingface.co/BAAI/bge-multilingual-gemma2), a multilingual embedding model based on gemma-2-9b, which supports multiple languages and diverse downstream tasks, achieving new SOTA on multilingual benchmarks (MIRACL, MTEB-fr, and MTEB-pl). :fire:
4244
- 7/26/2024: Release a new lightweight reranker [bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight), a lightweight reranker based on gemma-2-9b, which supports token compression and layerwise lightweight operations, can still ensure good performance while saving a significant amount of resources. :fire:
45+
46+
47+
48+
<details>
49+
<summary>More</summary>
50+
<!-- ### More -->
51+
4352
- 6/7/2024: Release a new benchmark [MLVU](https://github.com/JUNJIE99/MLVU), the first comprehensive benchmark specifically designed for long video understanding. MLVU features an extensive range of video durations, a diverse collection of video sources, and a set of evaluation tasks uniquely tailored for long-form video understanding. :fire:
4453
- 5/21/2024: Release a new benchmark [AIR-Bench](https://github.com/AIR-Bench/AIR-Bench) together with Jina AI, Zilliz, HuggingFace, and other partners. AIR-Bench focuses on a fair out-of-distribution evaluation for Neural IR & RAG. It generates the synthetic data for benchmarking w.r.t. diverse domains and languages. It is dynamic and will be updated on regular basis. [Leaderboard](https://huggingface.co/spaces/AIR-Bench/leaderboard) :fire:
4554
- 4/30/2024: Release [Llama-3-8B-Instruct-80K-QLoRA](https://huggingface.co/namespace-Pt/Llama-3-8B-Instruct-80K-QLoRA), extending the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA training on a few synthesized long-context data. The model achieves remarkable performance on various long-context benchmarks. [Code](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/longllm_qlora) :fire:
@@ -57,12 +66,6 @@ It is the first embedding model which supports all three retrieval methods, achi
5766
- 09/12/2023: New models:
5867
- **New reranker model**: release cross-encoder models `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`, which are more powerful than embedding model. We recommend to use/fine-tune them to re-rank top-k documents returned by embedding models.
5968
- **update embedding model**: release `bge-*-v1.5` embedding model to alleviate the issue of the similarity distribution, and enhance its retrieval ability without instruction.
60-
61-
62-
<details>
63-
<summary>More</summary>
64-
<!-- ### More -->
65-
6669
- 09/07/2023: Update [fine-tune code](https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/baai_general_embedding/README.md): Add script to mine hard negatives and support adding instruction during fine-tuning.
6770
- 08/09/2023: BGE Models are integrated into **Langchain**, you can use it like [this](#using-langchain); C-MTEB **leaderboard** is [available](https://huggingface.co/spaces/mteb/leaderboard).
6871
- 08/05/2023: Release base-scale and small-scale models, **best performance among the models of the same size 🤗**
@@ -72,7 +75,44 @@ It is the first embedding model which supports all three retrieval methods, achi
7275

7376
</details>
7477

78+
## Installation
79+
- Using pip:
80+
```
81+
pip install -U FlagEmbedding
82+
```
83+
- Install from sources:
84+
Clone the repository
85+
```
86+
git clone https://github.com/FlagOpen/FlagEmbedding.git
87+
cd FlagEmbedding
88+
pip install .
89+
```
90+
For development in editable mode:
91+
```
92+
pip install -e .
93+
```
7594

95+
## Quick Start
96+
First, load one of the BGE embedding model:
97+
```
98+
from FlagEmbedding import FlagModel
99+
100+
model = FlagModel('BAAI/bge-base-en-v1.5',
101+
query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
102+
use_fp16=True)
103+
```
104+
Then, feed some sentences to the model and get their embeddings:
105+
```
106+
sentences_1 = ["I love NLP", "I love machine learning"]
107+
sentences_2 = ["I love BGE", "I love text retrieval"]
108+
embeddings_1 = model.encode(sentences_1)
109+
embeddings_2 = model.encode(sentences_2)
110+
```
111+
Once we get the embeddings, we can compute similarity.
112+
```
113+
similarity = embeddings_1 @ embeddings_2.T
114+
print(similarity)
115+
```
76116

77117
## Projects
78118

0 commit comments

Comments
 (0)