Skip to content

Commit 1374b98

Browse files
committed
update docs
1 parent 6d9fa4e commit 1374b98

20 files changed

Lines changed: 435 additions & 52 deletions

File tree

.github/workflows/documentation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
- uses: actions/setup-python@v5
1414
- name: Install dependencies
1515
run: |
16-
pip install . sphinx sphinx_rtd_theme myst_parser myst-nb furo
16+
pip install . sphinx myst_parser myst-nb sphinx-design pydata-sphinx-theme
1717
- name: Sphinx build
1818
run: |
1919
sphinx-build docs/source docs/build

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ Currently we are updating the [tutorials](./Tutorials/), we aim to create a comp
159159
The following contents are releasing in the upcoming weeks:
160160

161161
- Evaluation
162-
- RAG
162+
- BGE-EN-ICL
163163

164164
<details>
165165
<summary>The whole tutorial roadmap</summary>

docs/requirements.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
sphinx
22
myst-nb
3-
furo
3+
sphinx-design
4+
pydata-sphinx-theme
5+
# furo

docs/source/API/abc.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ Abstract Class
33

44
.. toctree::
55
abc/inference
6+
abc/evaluation
67
abc/finetune

docs/source/API/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
API
2+
===
3+
4+
.. toctree::
5+
:hidden:
6+
:maxdepth: 1
7+
8+
abc
9+
inference
10+
evaluation
11+
finetune

docs/source/FAQ/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
FAQ
2+
===
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
Concept
2+
=======
3+
4+
Embedder
5+
--------
6+
7+
Embedder, or embedding model, is a model designed to convert data, usually text, codes, or images, into sparse or dense numerical vectors (embeddings) in a high dimensional vector space.
8+
These embeddings capture the semantic meaning or key features of the input, which enable efficient comparison and analysis.
9+
10+
A very famous demonstration is the example from `word2vec <https://arxiv.org/abs/1301.3781>`_. It shows how word embeddings capture semantic relationships through vector arithmetic:
11+
12+
.. image:: ../_static/img/word2vec.png
13+
:width: 500
14+
:align: center
15+
16+
Nowadays, embedders are capable of mapping sentences and even passages into vector space.
17+
They are widely used in real world tasks such as retrieval, clustering, etc.
18+
In the era of LLMs, embedding models play a pivot role in RAG, enables LLMs to access and integrate relevant context from vast external datasets.
19+
20+
Reranker
21+
--------
22+
23+
Reranker, or Cross-Encoder, is a model that refines the ranking of candidate pairs (e.g., query-document pairs) by jointly encoding and scoring them.
24+
25+
Typically, we use embedder as a Bi-Encoder. It first computes the embeddings of two input sentences, then compute their similarity using metrics such as cosine similarity or Euclidean distance.
26+
Whereas a reranker takes two sentences at the same time and directly computer a score representing their similarity.
27+
28+
The following figure shows their difference:
29+
30+
.. figure:: https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/Bi_vs_Cross-Encoder.png
31+
:width: 500
32+
:align: center
33+
34+
Bi-Encoder & Cross-Encoder (from Sentence Transformers)
35+
36+
Although Cross-Encoder usually has better performances than Bi-Encoder, it is extremly time consuming to use Cross-Encoder if we have a great amount of data.
37+
Thus a widely accepted approach is to use a Bi-Encoder for initial retrieval (e.g., selecting the top 100 candidates from 100,000 sentences) and then refine the ranking of the selected candidates using a Cross-Encoder for more accurate results.

docs/source/Introduction/index.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Introduction
2+
============
3+
4+
BGE builds one-stop retrieval toolkit for search and RAG. We provide inference, evaluation, and fine-tuning for embedding models and reranker.
5+
6+
.. figure:: ../_static/img/RAG_pipeline.png
7+
:width: 700
8+
:align: center
9+
10+
BGE embedder and reranker in an RAG pipeline.
11+
12+
Quickly get started with:
13+
14+
.. toctree::
15+
:maxdepth: 1
16+
17+
installation
18+
concept
19+
quick_start

docs/source/Introduction/installation.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,4 +40,9 @@ For development in editable mode:
4040
# If you do not want to finetune the models, you can install the package without the finetune dependency:
4141
pip install -e .
4242
# If you want to finetune the models, you can install the package with the finetune dependency:
43-
pip install -e .[finetune]
43+
pip install -e .[finetune]
44+
45+
PyTorch-CUDA
46+
------------
47+
48+
If you want to use CUDA GPUs during inference and finetuning, please install appropriate version of `PyTorch <https://pytorch.org/get-started/locally/>`_ with CUDA support.

docs/source/_static/css/custom.css

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
.bd-sidebar-primary {
2+
width: 22%;
3+
line-height: 1.4;
4+
}
5+
6+
.col-lg-3 {
7+
flex: 0 0 auto;
8+
width: 22%;
9+
}

0 commit comments

Comments
 (0)