Skip to content

Commit 2aa9204

Browse files
committed
update tutorials
1 parent c9cfa7c commit 2aa9204

4 files changed

Lines changed: 85 additions & 15 deletions

File tree

File renamed without changes.
File renamed without changes.

Tutorials/4_Evaluation/4.4.2_BEIR.ipynb

Lines changed: 85 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,40 @@
4242
"cell_type": "markdown",
4343
"metadata": {},
4444
"source": [
45-
"## 1. Use BEIR"
45+
"## 1. Evaluate using BEIR"
46+
]
47+
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"BEIR contains 18 datasets which can be downloaded from the [link](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/), while 4 of them are private datasets that need appropriate licences. If you want to access to those 4 datasets, take a look at their [wiki](https://github.com/beir-cellar/beir/wiki/Datasets-available) for more information. "
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"| Dataset Name | Type | Queries | Documents | Avg. Docs/Q | Public | \n",
60+
"| ---------| :-----------: | ---------| --------- | ------| :------------:| \n",
61+
"| ``msmarco`` | `Train` `Dev` `Test` | 6,980 | 8.84M | 1.1 | Yes | \n",
62+
"| ``trec-covid``| `Test` | 50| 171K| 493.5 | Yes | \n",
63+
"| ``nfcorpus`` | `Train` `Dev` `Test` | 323 | 3.6K | 38.2 | Yes |\n",
64+
"| ``bioasq``| `Train` `Test` | 500 | 14.91M | 8.05 | No | \n",
65+
"| ``nq``| `Train` `Test` | 3,452 | 2.68M | 1.2 | Yes | \n",
66+
"| ``hotpotqa``| `Train` `Dev` `Test` | 7,405 | 5.23M | 2.0 | Yes |\n",
67+
"| ``fiqa`` | `Train` `Dev` `Test` | 648 | 57K | 2.6 | Yes | \n",
68+
"| ``signal1m`` | `Test` | 97 | 2.86M | 19.6 | No |\n",
69+
"| ``trec-news`` | `Test` | 57 | 595K | 19.6 | No |\n",
70+
"| ``arguana`` | `Test` | 1,406 | 8.67K | 1.0 | Yes |\n",
71+
"| ``webis-touche2020``| `Test` | 49 | 382K | 49.2 | Yes |\n",
72+
"| ``cqadupstack``| `Test` | 13,145 | 457K | 1.4 | Yes |\n",
73+
"| ``quora``| `Dev` `Test` | 10,000 | 523K | 1.6 | Yes | \n",
74+
"| ``dbpedia-entity``| `Dev` `Test` | 400 | 4.63M | 38.2 | Yes | \n",
75+
"| ``scidocs``| `Test` | 1,000 | 25K | 4.9 | Yes | \n",
76+
"| ``fever``| `Train` `Dev` `Test` | 6,666 | 5.42M | 1.2| Yes | \n",
77+
"| ``climate-fever``| `Test` | 1,535 | 5.42M | 3.0 | Yes |\n",
78+
"| ``scifact``| `Train` `Test` | 300 | 5K | 1.1 | Yes |"
4679
]
4780
},
4881
{
@@ -52,6 +85,13 @@
5285
"### 1.1 Load Dataset"
5386
]
5487
},
88+
{
89+
"cell_type": "markdown",
90+
"metadata": {},
91+
"source": [
92+
"First prepare the logging setup."
93+
]
94+
},
5595
{
5696
"cell_type": "code",
5797
"execution_count": 12,
@@ -66,6 +106,13 @@
66106
" handlers=[LoggingHandler()])"
67107
]
68108
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"In this demo, we choose the `arguana` dataset for a quick demonstration."
114+
]
115+
},
69116
{
70117
"cell_type": "code",
71118
"execution_count": null,
@@ -140,6 +187,13 @@
140187
"### 1.2 Evaluation"
141188
]
142189
},
190+
{
191+
"cell_type": "markdown",
192+
"metadata": {},
193+
"source": [
194+
"Then we load `bge-base-en-v1.5` from huggingface and evaluate its performance on arguana."
195+
]
196+
},
143197
{
144198
"cell_type": "code",
145199
"execution_count": null,
@@ -248,7 +302,7 @@
248302
"cell_type": "markdown",
249303
"metadata": {},
250304
"source": [
251-
"## Evaluate using FlagEmbedding"
305+
"## 2. Evaluate using FlagEmbedding"
252306
]
253307
},
254308
{
@@ -267,7 +321,7 @@
267321
},
268322
{
269323
"cell_type": "code",
270-
"execution_count": 1,
324+
"execution_count": 3,
271325
"metadata": {},
272326
"outputs": [],
273327
"source": [
@@ -290,7 +344,8 @@
290344
" --eval_metrics ndcg_at_10 recall_at_100 \n",
291345
" --ignore_identical_ids True \n",
292346
" --embedder_name_or_path BAAI/bge-base-en-v1.5 \n",
293-
" --devices cuda:7\n",
347+
" --embedder_batch_size 1024\n",
348+
" --devices cuda:4\n",
294349
"\"\"\".replace('\\n','')\n",
295350
"\n",
296351
"sys.argv = arguments.split()"
@@ -305,9 +360,24 @@
305360
},
306361
{
307362
"cell_type": "code",
308-
"execution_count": null,
363+
"execution_count": 4,
309364
"metadata": {},
310-
"outputs": [],
365+
"outputs": [
366+
{
367+
"name": "stderr",
368+
"output_type": "stream",
369+
"text": [
370+
"Split 'dev' not found in the dataset. Removing it from the list.\n",
371+
"ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n",
372+
"pre tokenize: 100%|██████████| 9/9 [00:00<00:00, 16.19it/s]\n",
373+
"You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
374+
"Inference Embeddings: 100%|██████████| 9/9 [00:11<00:00, 1.27s/it]\n",
375+
"pre tokenize: 100%|██████████| 2/2 [00:00<00:00, 19.54it/s]\n",
376+
"Inference Embeddings: 100%|██████████| 2/2 [00:02<00:00, 1.29s/it]\n",
377+
"Searching: 100%|██████████| 44/44 [00:00<00:00, 208.73it/s]\n"
378+
]
379+
}
380+
],
311381
"source": [
312382
"from transformers import HfArgumentParser\n",
313383
"\n",
@@ -343,7 +413,7 @@
343413
},
344414
{
345415
"cell_type": "code",
346-
"execution_count": null,
416+
"execution_count": 5,
347417
"metadata": {},
348418
"outputs": [
349419
{
@@ -352,16 +422,16 @@
352422
"text": [
353423
"{\n",
354424
" \"arguana-test\": {\n",
355-
" \"ndcg_at_10\": 0.6361,\n",
356-
" \"ndcg_at_100\": 0.66057,\n",
357-
" \"map_at_10\": 0.55766,\n",
358-
" \"map_at_100\": 0.56337,\n",
359-
" \"recall_at_10\": 0.88407,\n",
425+
" \"ndcg_at_10\": 0.63668,\n",
426+
" \"ndcg_at_100\": 0.66075,\n",
427+
" \"map_at_10\": 0.55801,\n",
428+
" \"map_at_100\": 0.56358,\n",
429+
" \"recall_at_10\": 0.88549,\n",
360430
" \"recall_at_100\": 0.99147,\n",
361-
" \"precision_at_10\": 0.08841,\n",
431+
" \"precision_at_10\": 0.08855,\n",
362432
" \"precision_at_100\": 0.00991,\n",
363-
" \"mrr_at_10\": 0.55766,\n",
364-
" \"mrr_at_100\": 0.56337\n",
433+
" \"mrr_at_10\": 0.55809,\n",
434+
" \"mrr_at_100\": 0.56366\n",
365435
" }\n",
366436
"}\n"
367437
]

0 commit comments

Comments
 (0)