|
42 | 42 | "cell_type": "markdown", |
43 | 43 | "metadata": {}, |
44 | 44 | "source": [ |
45 | | - "## 1. Use BEIR" |
| 45 | + "## 1. Evaluate using BEIR" |
| 46 | + ] |
| 47 | + }, |
| 48 | + { |
| 49 | + "cell_type": "markdown", |
| 50 | + "metadata": {}, |
| 51 | + "source": [ |
| 52 | + "BEIR contains 18 datasets which can be downloaded from the [link](https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/), while 4 of them are private datasets that need appropriate licences. If you want to access to those 4 datasets, take a look at their [wiki](https://github.com/beir-cellar/beir/wiki/Datasets-available) for more information. " |
| 53 | + ] |
| 54 | + }, |
| 55 | + { |
| 56 | + "cell_type": "markdown", |
| 57 | + "metadata": {}, |
| 58 | + "source": [ |
| 59 | + "| Dataset Name | Type | Queries | Documents | Avg. Docs/Q | Public | \n", |
| 60 | + "| ---------| :-----------: | ---------| --------- | ------| :------------:| \n", |
| 61 | + "| ``msmarco`` | `Train` `Dev` `Test` | 6,980 | 8.84M | 1.1 | Yes | \n", |
| 62 | + "| ``trec-covid``| `Test` | 50| 171K| 493.5 | Yes | \n", |
| 63 | + "| ``nfcorpus`` | `Train` `Dev` `Test` | 323 | 3.6K | 38.2 | Yes |\n", |
| 64 | + "| ``bioasq``| `Train` `Test` | 500 | 14.91M | 8.05 | No | \n", |
| 65 | + "| ``nq``| `Train` `Test` | 3,452 | 2.68M | 1.2 | Yes | \n", |
| 66 | + "| ``hotpotqa``| `Train` `Dev` `Test` | 7,405 | 5.23M | 2.0 | Yes |\n", |
| 67 | + "| ``fiqa`` | `Train` `Dev` `Test` | 648 | 57K | 2.6 | Yes | \n", |
| 68 | + "| ``signal1m`` | `Test` | 97 | 2.86M | 19.6 | No |\n", |
| 69 | + "| ``trec-news`` | `Test` | 57 | 595K | 19.6 | No |\n", |
| 70 | + "| ``arguana`` | `Test` | 1,406 | 8.67K | 1.0 | Yes |\n", |
| 71 | + "| ``webis-touche2020``| `Test` | 49 | 382K | 49.2 | Yes |\n", |
| 72 | + "| ``cqadupstack``| `Test` | 13,145 | 457K | 1.4 | Yes |\n", |
| 73 | + "| ``quora``| `Dev` `Test` | 10,000 | 523K | 1.6 | Yes | \n", |
| 74 | + "| ``dbpedia-entity``| `Dev` `Test` | 400 | 4.63M | 38.2 | Yes | \n", |
| 75 | + "| ``scidocs``| `Test` | 1,000 | 25K | 4.9 | Yes | \n", |
| 76 | + "| ``fever``| `Train` `Dev` `Test` | 6,666 | 5.42M | 1.2| Yes | \n", |
| 77 | + "| ``climate-fever``| `Test` | 1,535 | 5.42M | 3.0 | Yes |\n", |
| 78 | + "| ``scifact``| `Train` `Test` | 300 | 5K | 1.1 | Yes |" |
46 | 79 | ] |
47 | 80 | }, |
48 | 81 | { |
|
52 | 85 | "### 1.1 Load Dataset" |
53 | 86 | ] |
54 | 87 | }, |
| 88 | + { |
| 89 | + "cell_type": "markdown", |
| 90 | + "metadata": {}, |
| 91 | + "source": [ |
| 92 | + "First prepare the logging setup." |
| 93 | + ] |
| 94 | + }, |
55 | 95 | { |
56 | 96 | "cell_type": "code", |
57 | 97 | "execution_count": 12, |
|
66 | 106 | " handlers=[LoggingHandler()])" |
67 | 107 | ] |
68 | 108 | }, |
| 109 | + { |
| 110 | + "cell_type": "markdown", |
| 111 | + "metadata": {}, |
| 112 | + "source": [ |
| 113 | + "In this demo, we choose the `arguana` dataset for a quick demonstration." |
| 114 | + ] |
| 115 | + }, |
69 | 116 | { |
70 | 117 | "cell_type": "code", |
71 | 118 | "execution_count": null, |
|
140 | 187 | "### 1.2 Evaluation" |
141 | 188 | ] |
142 | 189 | }, |
| 190 | + { |
| 191 | + "cell_type": "markdown", |
| 192 | + "metadata": {}, |
| 193 | + "source": [ |
| 194 | + "Then we load `bge-base-en-v1.5` from huggingface and evaluate its performance on arguana." |
| 195 | + ] |
| 196 | + }, |
143 | 197 | { |
144 | 198 | "cell_type": "code", |
145 | 199 | "execution_count": null, |
|
248 | 302 | "cell_type": "markdown", |
249 | 303 | "metadata": {}, |
250 | 304 | "source": [ |
251 | | - "## Evaluate using FlagEmbedding" |
| 305 | + "## 2. Evaluate using FlagEmbedding" |
252 | 306 | ] |
253 | 307 | }, |
254 | 308 | { |
|
267 | 321 | }, |
268 | 322 | { |
269 | 323 | "cell_type": "code", |
270 | | - "execution_count": 1, |
| 324 | + "execution_count": 3, |
271 | 325 | "metadata": {}, |
272 | 326 | "outputs": [], |
273 | 327 | "source": [ |
|
290 | 344 | " --eval_metrics ndcg_at_10 recall_at_100 \n", |
291 | 345 | " --ignore_identical_ids True \n", |
292 | 346 | " --embedder_name_or_path BAAI/bge-base-en-v1.5 \n", |
293 | | - " --devices cuda:7\n", |
| 347 | + " --embedder_batch_size 1024\n", |
| 348 | + " --devices cuda:4\n", |
294 | 349 | "\"\"\".replace('\\n','')\n", |
295 | 350 | "\n", |
296 | 351 | "sys.argv = arguments.split()" |
|
305 | 360 | }, |
306 | 361 | { |
307 | 362 | "cell_type": "code", |
308 | | - "execution_count": null, |
| 363 | + "execution_count": 4, |
309 | 364 | "metadata": {}, |
310 | | - "outputs": [], |
| 365 | + "outputs": [ |
| 366 | + { |
| 367 | + "name": "stderr", |
| 368 | + "output_type": "stream", |
| 369 | + "text": [ |
| 370 | + "Split 'dev' not found in the dataset. Removing it from the list.\n", |
| 371 | + "ignore_identical_ids is set to True. This means that the search results will not contain identical ids. Note: Dataset such as MIRACL should NOT set this to True.\n", |
| 372 | + "pre tokenize: 100%|██████████| 9/9 [00:00<00:00, 16.19it/s]\n", |
| 373 | + "You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n", |
| 374 | + "Inference Embeddings: 100%|██████████| 9/9 [00:11<00:00, 1.27s/it]\n", |
| 375 | + "pre tokenize: 100%|██████████| 2/2 [00:00<00:00, 19.54it/s]\n", |
| 376 | + "Inference Embeddings: 100%|██████████| 2/2 [00:02<00:00, 1.29s/it]\n", |
| 377 | + "Searching: 100%|██████████| 44/44 [00:00<00:00, 208.73it/s]\n" |
| 378 | + ] |
| 379 | + } |
| 380 | + ], |
311 | 381 | "source": [ |
312 | 382 | "from transformers import HfArgumentParser\n", |
313 | 383 | "\n", |
|
343 | 413 | }, |
344 | 414 | { |
345 | 415 | "cell_type": "code", |
346 | | - "execution_count": null, |
| 416 | + "execution_count": 5, |
347 | 417 | "metadata": {}, |
348 | 418 | "outputs": [ |
349 | 419 | { |
|
352 | 422 | "text": [ |
353 | 423 | "{\n", |
354 | 424 | " \"arguana-test\": {\n", |
355 | | - " \"ndcg_at_10\": 0.6361,\n", |
356 | | - " \"ndcg_at_100\": 0.66057,\n", |
357 | | - " \"map_at_10\": 0.55766,\n", |
358 | | - " \"map_at_100\": 0.56337,\n", |
359 | | - " \"recall_at_10\": 0.88407,\n", |
| 425 | + " \"ndcg_at_10\": 0.63668,\n", |
| 426 | + " \"ndcg_at_100\": 0.66075,\n", |
| 427 | + " \"map_at_10\": 0.55801,\n", |
| 428 | + " \"map_at_100\": 0.56358,\n", |
| 429 | + " \"recall_at_10\": 0.88549,\n", |
360 | 430 | " \"recall_at_100\": 0.99147,\n", |
361 | | - " \"precision_at_10\": 0.08841,\n", |
| 431 | + " \"precision_at_10\": 0.08855,\n", |
362 | 432 | " \"precision_at_100\": 0.00991,\n", |
363 | | - " \"mrr_at_10\": 0.55766,\n", |
364 | | - " \"mrr_at_100\": 0.56337\n", |
| 433 | + " \"mrr_at_10\": 0.55809,\n", |
| 434 | + " \"mrr_at_100\": 0.56366\n", |
365 | 435 | " }\n", |
366 | 436 | "}\n" |
367 | 437 | ] |
|
0 commit comments