Skip to content

Commit 5546709

Browse files
committed
chore(doc): add extension huggingface document and update README
1 parent 9fe5396 commit 5546709

2 files changed

Lines changed: 95 additions & 4 deletions

File tree

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Hugging Face
2+
3+
## Installation
4+
5+
1. Install the package:
6+
7+
**If you are developing with binary, the package is already bundled in the binary. You can skip this step.**
8+
9+
```bash
10+
npm i @vulcan-sql/extension-huggingface
11+
```
12+
13+
2. Update your `vulcan.yaml` file to enable the extension:
14+
15+
```yaml
16+
extensions:
17+
...
18+
// highlight-next-line
19+
hf: '@vulcan-sql/extension-huggingface'
20+
21+
// highlight-next-line
22+
hf:
23+
// highlight-next-line
24+
# Required: Hugging Face access token, see: https://huggingface.co/docs/hub/security-tokens
25+
// highlight-next-line
26+
accessToken: 'your-huggingface-access-token'
27+
```
28+
29+
## Using Hugging Face
30+
31+
VulcanSQL support using Hugging Face tasks by [VulcanSQL Filters](https://vulcansql.com/docs/develop/advance#filters) statement.
32+
33+
:::caution
34+
Hugging Face has a [rate limit](https://huggingface.co/docs/api-inference/faq#rate-limits), so it does not allow sending large datasets to the Hugging Face library for processing.
35+
36+
Otherwise, using a different Hugging Face model may yield different results or even result in failure.
37+
:::
38+
39+
### Table Question Answering
40+
41+
The [Table Question Answering](https://huggingface.co/docs/api-inference/detailed_parameters#table-question-answering-task) is one of the Natural Language Processing tasks supported by Hugging Face.
42+
43+
Using the `huggingface_table_question_answering` filter.
44+
45+
Sample 1:
46+
47+
```sql
48+
{% set data = [
49+
{
50+
"repository": "vulcan-sql",
51+
"topic": ["analytics", "data-lake", "data-warehouse", "api-builder"],
52+
"description":"Create and share Data APIs fast! Data API framework for DuckDB, ClickHouse, Snowflake, BigQuery, PostgreSQL"
53+
},
54+
{
55+
"repository": "accio",
56+
"topic": ["data-analytics", "data-lake", "data-warehouse", "bussiness-intelligence"],
57+
"description": "Query Your Data Warehouse Like Exploring One Big View."
58+
},
59+
{
60+
"repository": "hell-word",
61+
"topic": [],
62+
"description": "Sample repository for testing"
63+
}
64+
] %}
65+
66+
-- The source data for "huggingface_table_question_answering" needs to be an array of objects.
67+
SELECT {{ data | huggingface_table_question_answering(query="How many repositories related to data-lake topic?") }}
68+
```
69+
70+
Sample 2:
71+
72+
```sql
73+
{% req products %}
74+
SELECT * FROM products
75+
{% endreq %}
76+
77+
SELECT {{ products.value() | huggingface_table_question_answering(query="How many products related to 3C type?", model="microsoft/tapex-base-finetuned-wtq", wait_for_model=true, use_cache=true) }}
78+
```
79+
80+
### Arguments
81+
82+
Please check [Table Question Answering](https://huggingface.co/docs/api-inference/detailed_parameters#table-question-answering-task) for further information.
83+
84+
| Name | Required | Default | Description |
85+
| -------------- | -------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
86+
| query | Y | | The query in plain text that you want to ask the table. |
87+
| model | N | google/tapas-base-finetuned-wtq | The model id of a pretrained model hosted inside a model repo on huggingface.co. See: https://huggingface.co/models?pipeline_tag=table-question-answering |
88+
| use_cache | N | true | There is a cache layer on the inference API to speedup requests we have already seen |
89+
| wait_for_model | N | false | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done |

packages/extension-huggingface/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Supporting Hugging Face Inference API task for VulcanSQL, provided by [Canner](h
2525
2626
VulcanSQL support using Hugging Face tasks by [VulcanSQL Filters](https://vulcansql.com/docs/develop/advance#filters) statement.
2727
28-
**⚠️ Caution**: Hugging Face has a [rate limit](https://huggingface.co/docs/api-inference/faq#rate-limits), so it does not allow sending large datasets to the Hugging Face library for processing.
28+
**⚠️ Caution**: Hugging Face has a [rate limit](https://huggingface.co/docs/api-inference/faq#rate-limits), so it does not allow sending large datasets to the Hugging Face library for processing. Otherwise, using a different Hugging Face model may yield different results or even result in failure.
2929
3030
### Table Question Answering
3131
@@ -54,7 +54,7 @@ Sample 1:
5454
}
5555
] %}
5656
57-
-- The source data from "huggingface_table_question_answering" need array of object type.
57+
-- The source data for "huggingface_table_question_answering" needs to be an array of objects.
5858
SELECT {{ data | huggingface_table_question_answering(query="How many repositories related to data-lake topic?") }}
5959
```
6060

@@ -65,6 +65,8 @@ Sample 2:
6565
SELECT * FROM products
6666
{% endreq %}
6767
68-
-- The "model" argument is optional, if not provide it, default is 'google/tapas-base-finetuned-wtq'
69-
SELECT {{ products.value() | huggingface_table_question_answering(query="How many products related to 3C type?", model="microsoft/tapex-base-finetuned-wtq") }}
68+
-- The "model" keyword argument is optional. If not provided, the default value is 'google/tapas-base-finetuned-wtq'.
69+
-- The "wait_for_model" keyword argument is optional. If not provided, the default value is false.
70+
-- The "use_cache" keyword argument is optional. If not provided, the default value is true.
71+
SELECT {{ products.value() | huggingface_table_question_answering(query="How many products related to 3C type?", model="microsoft/tapex-base-finetuned-wtq", wait_for_model=true, use_cache=true) }}
7072
```

0 commit comments

Comments
 (0)