Skip to content
This repository was archived by the owner on Mar 29, 2023. It is now read-only.

Commit 506aa99

Browse files
Initial version (tests not running)
0 parents  commit 506aa99

27 files changed

+5516
-0
lines changed

.github/workflows/main.yml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
name: Main
2+
3+
on:
4+
push:
5+
branches: master
6+
pull_request:
7+
branches: master
8+
9+
jobs:
10+
lint:
11+
name: Lint
12+
runs-on: ubuntu-latest
13+
steps:
14+
15+
- name: checkout
16+
uses: actions/checkout@v1
17+
18+
- name: setup python
19+
uses: actions/setup-python@v2
20+
with:
21+
python-version: "3.7"
22+
23+
- name: install dependencies
24+
run: |
25+
python -m pip install -r requirements.txt
26+
python -m pip install -e .
27+
- name: lint
28+
run: flake8 .
29+
30+
- name: mypy
31+
run: mypy --ignore-missing-imports .
32+
if: always()
33+
34+
- name: pydocstyle
35+
run: pydocstyle .
36+
if: always()
37+
38+
- name: isort
39+
run: isort --check-only .
40+
if: always()

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.pyc
2+
__pycache__

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Ibis BigQuery backend

ci/schema.sql

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
CREATE OR REPLACE TABLE `testing.functional_alltypes_parted`
2+
(
3+
index INT64,
4+
Unnamed_0 INT64,
5+
id INT64,
6+
bool_col BOOL,
7+
tinyint_col INT64,
8+
smallint_col INT64,
9+
int_col INT64,
10+
bigint_col INT64,
11+
float_col FLOAT64,
12+
double_col FLOAT64,
13+
date_string_col STRING,
14+
string_col STRING,
15+
timestamp_col TIMESTAMP,
16+
year INT64,
17+
month INT64
18+
)
19+
PARTITION BY DATE(_PARTITIONTIME)
20+
OPTIONS (
21+
require_partition_filter=false
22+
);
23+
24+
CREATE OR REPLACE TABLE `testing.functional_alltypes`
25+
(
26+
index INT64,
27+
Unnamed_0 INT64,
28+
id INT64,
29+
bool_col BOOL,
30+
tinyint_col INT64,
31+
smallint_col INT64,
32+
int_col INT64,
33+
bigint_col INT64,
34+
float_col FLOAT64,
35+
double_col FLOAT64,
36+
date_string_col STRING,
37+
string_col STRING,
38+
timestamp_col TIMESTAMP,
39+
year INT64,
40+
month INT64
41+
);

docs/bigquery.rst

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
.. currentmodule:: ibis.bigquery.api
2+
3+
.. _backends.bigquery:
4+
5+
BigQuery
6+
========
7+
8+
To use the BigQuery client, you will need a Google Cloud Platform account.
9+
Use the `BigQuery sandbox <https://cloud.google.com/bigquery/docs/sandbox>`__
10+
to try the service for free.
11+
12+
.. _install.bigquery:
13+
14+
`BigQuery <https://cloud.google.com/bigquery/>`_ Quickstart
15+
-----------------------------------------------------------
16+
17+
Install dependencies for Ibis's BigQuery dialect:
18+
19+
::
20+
21+
pip install ibis-framework[bigquery]
22+
23+
Create a client by passing in the project id and dataset id you wish to operate
24+
with:
25+
26+
27+
.. code-block:: python
28+
29+
>>> con = ibis.bigquery.connect(project_id='ibis-gbq', dataset_id='testing')
30+
31+
By default ibis assumes that the BigQuery project that's billed for queries is
32+
also the project where the data lives.
33+
34+
However, it's very easy to query data that does **not** live in the billing
35+
project.
36+
37+
.. note::
38+
39+
When you run queries against data from other projects **the billing project
40+
will still be billed for any and all queries**.
41+
42+
If you want to query data that lives in a different project than the billing
43+
project you can use the :meth:`ibis.bigquery.client.BigQueryClient.database`
44+
method of :class:`ibis.bigquery.client.BigQueryClient` objects:
45+
46+
.. code-block:: python
47+
48+
>>> db = con.database('other-data-project.other-dataset')
49+
>>> t = db.my_awesome_table
50+
>>> t.sweet_column.sum().execute() # runs against the billing project
51+
52+
.. _api.bigquery:
53+
54+
API
55+
---
56+
.. currentmodule:: ibis.backends.bigquery
57+
58+
The BigQuery client is accessible through the ``ibis.bigquery`` namespace.
59+
See :ref:`backends.bigquery` for a tutorial on using this backend.
60+
61+
Use the ``ibis.bigquery.connect`` function to create a BigQuery
62+
client. If no ``credentials`` are provided, the
63+
:func:`pydata_google_auth.default` function fetches default credentials.
64+
65+
.. autosummary::
66+
:toctree: ../generated/
67+
68+
Backend.connect
69+
BigQueryClient.database
70+
BigQueryClient.list_databases
71+
BigQueryClient.list_tables
72+
BigQueryClient.table
73+
74+
The BigQuery client object
75+
--------------------------
76+
77+
To use Ibis with BigQuery, you first must connect to BigQuery using the
78+
:func:`ibis.bigquery.connect` function, optionally supplying Google API
79+
credentials:
80+
81+
.. code-block:: python
82+
83+
import ibis
84+
85+
client = ibis.bigquery.connect(
86+
project_id=YOUR_PROJECT_ID,
87+
dataset_id='bigquery-public-data.stackoverflow'
88+
)
89+
90+
.. _udf.bigquery:
91+
92+
User Defined functions (UDF)
93+
----------------------------
94+
95+
.. note::
96+
97+
BigQuery only supports element-wise UDFs at this time.
98+
99+
BigQuery supports UDFs through JavaScript. Ibis provides support for this by
100+
turning Python code into JavaScript.
101+
102+
The interface is very similar to the pandas UDF API:
103+
104+
.. code-block:: python
105+
106+
import ibis.expr.datatypes as dt
107+
from ibis.bigquery import udf
108+
109+
@udf([dt.double], dt.double)
110+
def my_bigquery_add_one(x):
111+
return x + 1.0
112+
113+
Ibis will parse the source of the function and turn the resulting Python AST
114+
into JavaScript source code (technically, ECMAScript 2015). Most of the Python
115+
language is supported including classes, functions and generators.
116+
117+
When you want to use this function you call it like any other Python
118+
function--only it must be called on an ibis expression:
119+
120+
.. code-block:: python
121+
122+
t = ibis.table([('a', 'double')])
123+
expr = my_bigquery_add_one(t.a)
124+
print(ibis.bigquery.compile(expr))
125+
126+
.. _bigquery-privacy:
127+
128+
Privacy
129+
-------
130+
131+
This package is subject to the `NumFocus privacy policy
132+
<https://numfocus.org/privacy-policy>`_. Your use of Google APIs with this
133+
module is subject to each API's respective `terms of service
134+
<https://developers.google.com/terms/>`_.
135+
136+
Google account and user data
137+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138+
139+
Accessing user data
140+
~~~~~~~~~~~~~~~~~~~
141+
142+
The :func:`~ibis.bigquery.api.connect` function provides access to data
143+
stored in Google BigQuery and other sources such as Google Sheets or Cloud
144+
Storage, via the federated query feature. Your machine communicates directly
145+
with the Google APIs.
146+
147+
Storing user data
148+
~~~~~~~~~~~~~~~~~
149+
150+
By default, your credentials are stored to a local file, such as
151+
``~/.config/pydata/ibis.json``. All user data is stored on
152+
your local machine. **Use caution when using this library on a shared
153+
machine**.
154+
155+
Sharing user data
156+
~~~~~~~~~~~~~~~~~
157+
158+
The BigQuery client only communicates with Google APIs. No user data is
159+
shared with PyData, NumFocus, or any other servers.
160+
161+
Policies for application authors
162+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163+
164+
Do not use the default client ID when using Ibis from an application,
165+
library, or tool. Per the `Google User Data Policy
166+
<https://developers.google.com/terms/api-services-user-data-policy>`_, your
167+
application must accurately represent itself when authenticating to Google
168+
API services.
169+
170+
Extending the BigQuery backend
171+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
172+
173+
* Create a Google Cloud project.
174+
* Set the ``GOOGLE_BIGQUERY_PROJECT_ID`` environment variable.
175+
* Populate test data: ``python ci/datamgr.py bigquery``
176+
* Run the test suite: ``pytest ibis/bigquery/tests``

environment.yml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: ibis-bigquery-dev
2+
channels:
3+
- conda-forge
4+
dependencies:
5+
6+
# core
7+
- ibis-framework # TODO: require Ibis 2.0 when it's released
8+
- google-cloud-bigquery-core >=1.12.0,<1.24.0dev
9+
- pydata-google-auth
10+
11+
# dev
12+
- pytest
13+
- pytest-cov
14+
- pytest-mock
15+
- mock
16+
- flake8
17+
- flake8-comprehensions
18+
- mypy
19+
- isort
20+
- pydocstyle
21+
- setuptools-scm

0 commit comments

Comments
 (0)