|
| 1 | +# Hugging Face |
| 2 | + |
| 3 | +## Installation |
| 4 | + |
| 5 | +1. Install the package: |
| 6 | + |
| 7 | + **If you are developing with binary, the package is already bundled in the binary. You can skip this step.** |
| 8 | + |
| 9 | + ```bash |
| 10 | + npm i @vulcan-sql/extension-huggingface |
| 11 | + ``` |
| 12 | + |
| 13 | +2. Update your `vulcan.yaml` file to enable the extension: |
| 14 | + |
| 15 | + ```yaml |
| 16 | + extensions: |
| 17 | + ... |
| 18 | + // highlight-next-line |
| 19 | + hf: '@vulcan-sql/extension-huggingface' |
| 20 | + |
| 21 | + // highlight-next-line |
| 22 | + hf: |
| 23 | + // highlight-next-line |
| 24 | + # Required: Hugging Face access token, see: https://huggingface.co/docs/hub/security-tokens |
| 25 | + // highlight-next-line |
| 26 | + accessToken: 'your-huggingface-access-token' |
| 27 | + ``` |
| 28 | +
|
| 29 | +## Using Hugging Face |
| 30 | +
|
| 31 | +VulcanSQL support using Hugging Face tasks by [VulcanSQL Filters](https://vulcansql.com/docs/develop/advance#filters) statement. |
| 32 | +
|
| 33 | +:::caution |
| 34 | +Hugging Face has a [rate limit](https://huggingface.co/docs/api-inference/faq#rate-limits), so it does not allow sending large datasets to the Hugging Face library for processing. |
| 35 | +
|
| 36 | +Otherwise, using a different Hugging Face model may yield different results or even result in failure. |
| 37 | +::: |
| 38 | +
|
| 39 | +### Table Question Answering |
| 40 | +
|
| 41 | +The [Table Question Answering](https://huggingface.co/docs/api-inference/detailed_parameters#table-question-answering-task) is one of the Natural Language Processing tasks supported by Hugging Face. |
| 42 | +
|
| 43 | +Using the `huggingface_table_question_answering` filter. |
| 44 | + |
| 45 | +Sample 1: |
| 46 | + |
| 47 | +```sql |
| 48 | +{% set data = [ |
| 49 | + { |
| 50 | + "repository": "vulcan-sql", |
| 51 | + "topic": ["analytics", "data-lake", "data-warehouse", "api-builder"], |
| 52 | + "description":"Create and share Data APIs fast! Data API framework for DuckDB, ClickHouse, Snowflake, BigQuery, PostgreSQL" |
| 53 | + }, |
| 54 | + { |
| 55 | + "repository": "accio", |
| 56 | + "topic": ["data-analytics", "data-lake", "data-warehouse", "bussiness-intelligence"], |
| 57 | + "description": "Query Your Data Warehouse Like Exploring One Big View." |
| 58 | + }, |
| 59 | + { |
| 60 | + "repository": "hell-word", |
| 61 | + "topic": [], |
| 62 | + "description": "Sample repository for testing" |
| 63 | + } |
| 64 | +] %} |
| 65 | +
|
| 66 | +-- The source data for "huggingface_table_question_answering" needs to be an array of objects. |
| 67 | +SELECT {{ data | huggingface_table_question_answering(query="How many repositories related to data-lake topic?") }} |
| 68 | +``` |
| 69 | + |
| 70 | +Sample 2: |
| 71 | + |
| 72 | +```sql |
| 73 | +{% req products %} |
| 74 | + SELECT * FROM products |
| 75 | +{% endreq %} |
| 76 | +
|
| 77 | +SELECT {{ products.value() | huggingface_table_question_answering(query="How many products related to 3C type?", model="microsoft/tapex-base-finetuned-wtq", wait_for_model=true, use_cache=true) }} |
| 78 | +``` |
| 79 | + |
| 80 | +### Arguments |
| 81 | + |
| 82 | +Please check [Table Question Answering](https://huggingface.co/docs/api-inference/detailed_parameters#table-question-answering-task) for further information. |
| 83 | + |
| 84 | +| Name | Required | Default | Description | |
| 85 | +| -------------- | -------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 86 | +| query | Y | | The query in plain text that you want to ask the table. | |
| 87 | +| model | N | google/tapas-base-finetuned-wtq | The model id of a pretrained model hosted inside a model repo on huggingface.co. See: https://huggingface.co/models?pipeline_tag=table-question-answering | |
| 88 | +| use_cache | N | true | There is a cache layer on the inference API to speedup requests we have already seen | |
| 89 | +| wait_for_model | N | false | If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done | |
0 commit comments