Skip to content

Commit ce38872

Browse files
authored
Merge pull request #831 from tswast/new-dataset-project
update dataset to new Google-hosted location
2 parents 1ae4b30 + 845456d commit ce38872

1 file changed

Lines changed: 16 additions & 16 deletions

File tree

source/guides/analyzing-pypi-package-downloads.rst

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,10 @@ the `BigQuery quickstart guide
6868
Data schema
6969
-----------
7070

71-
Linehaul writes an entry in a ``the-psf.pypi.file_downloads`` table for each
71+
Linehaul writes an entry in a ``bigquery-public-data.pypi.file_downloads`` table for each
7272
download. The table contains information about what file was downloaded and how
7373
it was downloaded. Some useful columns from the `table schema
74-
<https://console.cloud.google.com/bigquery?pli=1&p=the-psf&d=pypi&t=file_downloads&page=table>`__
74+
<https://console.cloud.google.com/bigquery?pli=1&p=bigquery-public-data&d=pypi&t=file_downloads&page=table>`__
7575
include:
7676

7777
+------------------------+-----------------+-----------------------------+
@@ -108,7 +108,7 @@ The following query counts the total number of downloads for the project
108108

109109
#standardSQL
110110
SELECT COUNT(*) AS num_downloads
111-
FROM `the-psf.pypi.file_downloads`
111+
FROM `bigquery-public-data.pypi.file_downloads`
112112
WHERE file.project = 'pytest'
113113
-- Only query the last 30 days of history
114114
AND DATE(timestamp)
@@ -118,7 +118,7 @@ The following query counts the total number of downloads for the project
118118
+---------------+
119119
| num_downloads |
120120
+===============+
121-
| 20531925 |
121+
| 26190085 |
122122
+---------------+
123123

124124
To only count downloads from pip, filter on the ``details.installer.name``
@@ -128,7 +128,7 @@ column.
128128

129129
#standardSQL
130130
SELECT COUNT(*) AS num_downloads
131-
FROM `the-psf.pypi.file_downloads`
131+
FROM `bigquery-public-data.pypi.file_downloads`
132132
WHERE file.project = 'pytest'
133133
AND details.installer.name = 'pip'
134134
-- Only query the last 30 days of history
@@ -139,7 +139,7 @@ column.
139139
+---------------+
140140
| num_downloads |
141141
+===============+
142-
| 19391645 |
142+
| 24334215 |
143143
+---------------+
144144

145145
Package downloads over time
@@ -154,7 +154,7 @@ filtering by this column reduces corresponding costs.
154154
SELECT
155155
COUNT(*) AS num_downloads,
156156
DATE_TRUNC(DATE(timestamp), MONTH) AS `month`
157-
FROM `the-psf.pypi.file_downloads`
157+
FROM `bigquery-public-data.pypi.file_downloads`
158158
WHERE
159159
file.project = 'pytest'
160160
-- Only query the last 6 months of history
@@ -192,7 +192,7 @@ query processes over 500 GB of data.
192192
SELECT
193193
REGEXP_EXTRACT(details.python, r"[0-9]+\.[0-9]+") AS python_version,
194194
COUNT(*) AS num_downloads,
195-
FROM `the-psf.pypi.file_downloads`
195+
FROM `bigquery-public-data.pypi.file_downloads`
196196
WHERE
197197
-- Only query the last 6 months of history
198198
DATE(timestamp)
@@ -204,17 +204,17 @@ query processes over 500 GB of data.
204204
+--------+---------------+
205205
| python | num_downloads |
206206
+========+===============+
207-
| 3.7 | 12990683561 |
207+
| 3.7 | 18051328726 |
208208
+--------+---------------+
209-
| 3.6 | 9035598511 |
209+
| 3.6 | 9635067203 |
210210
+--------+---------------+
211-
| 2.7 | 8467785320 |
211+
| 3.8 | 7781904681 |
212212
+--------+---------------+
213-
| 3.8 | 4581627740 |
213+
| 2.7 | 6381252241 |
214214
+--------+---------------+
215-
| 3.5 | 2412533601 |
215+
| null | 2026630299 |
216216
+--------+---------------+
217-
| null | 1641456718 |
217+
| 3.5 | 1894153540 |
218218
+--------+---------------+
219219

220220
Caveats
@@ -251,7 +251,7 @@ the official Python client library for BigQuery.
251251
252252
query_job = client.query("""
253253
SELECT COUNT(*) AS num_downloads
254-
FROM `the-psf.pypi.file_downloads`
254+
FROM `bigquery-public-data.pypi.file_downloads`
255255
WHERE file.project = 'pytest'
256256
-- Only query the last 30 days of history
257257
AND DATE(timestamp)
@@ -303,7 +303,7 @@ References
303303
.. [#] `PyPI Download Counts deprecation email <https://mail.python.org/pipermail/distutils-sig/2013-May/020855.html>`__
304304
.. [#] `PyPI BigQuery dataset announcement email <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>`__
305305
306-
.. _public PyPI download statistics dataset: https://console.cloud.google.com/bigquery?p=the-psf&d=pypi&page=dataset
306+
.. _public PyPI download statistics dataset: https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=pypi&page=dataset
307307
.. _bandersnatch: /key_projects/#bandersnatch
308308
.. _Google BigQuery: https://cloud.google.com/bigquery
309309
.. _BigQuery web UI: https://console.cloud.google.com/bigquery

0 commit comments

Comments
 (0)