Skip to content

Commit 707c108

Browse files
authored
sync fork
2 parents a69329a + 8291c93 commit 707c108

27 files changed

Lines changed: 970 additions & 374 deletions

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
source/guides/github-actions-ci-cd-sample/* @webknjaz
2+
source/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows.rst @webknjaz

noxfile.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,12 @@ def build(session, autobuild=False):
1717

1818
if autobuild:
1919
command = "sphinx-autobuild"
20+
extra_args = "-H", "0.0.0.0"
2021
else:
2122
command = "sphinx-build"
23+
extra_args = ()
2224

23-
session.run(command, "-W", "-b", "html", "source", "build")
25+
session.run(command, *extra_args, "-W", "-b", "html", "source", "build")
2426

2527

2628
@nox.session(py="3")

source/contribute.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,10 +126,8 @@ contributions to be accepted into the project.
126126
Purpose
127127
-------
128128

129-
The purpose of the |PyPUG| is
130-
131-
to be the authoritative resource on how to package, publish, and install
132-
Python projects using current tools.
129+
The purpose of the |PyPUG| is to be the authoritative resource on how to
130+
package, publish, and install Python projects using current tools.
133131

134132

135133
Scope

source/discussions/deploying-python-applications.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,11 @@ the Python-interpreter and declare the dependencies of the program. The tool
6767
downloads the specified Python-interpreter for Windows and packages it with all
6868
the dependencies in a single Windows-executable installer.
6969

70-
The installer installs or updates the Python-interpreter on the users system,
71-
which can be used independently of the packaged program. The program itself,
72-
can be started from a shortcut, that the installer places in the start-menu.
73-
Uninstalling the program leaves the Python installation of the user intact.
70+
The installed program can be started from a shortcut that the installer adds to
71+
the start-menu. It uses a Python interpreter installed within its application
72+
directory, independent of any other Python installation on the computer.
7473

75-
A big advantage of pynsist is that the Windows packages can be built on Linux.
74+
A big advantage of Pynsist is that the Windows packages can be built on Linux.
7675
There are several examples for different kinds of programs (console, GUI) in
7776
the `documentation <https://pynsist.readthedocs.io>`__. The tool is released
7877
under the MIT-licence.

source/glossary.rst

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -122,17 +122,20 @@ Glossary
122122

123123
Pure Module
124124

125-
A :term:`module` written in Python and contained in a single .py file (and
126-
possibly associated .pyc and/or .pyo files).
125+
A :term:`module` written in Python and contained in a single `.py` file (and
126+
possibly associated `.pyc` and/or `.pyo` files).
127127

128128

129129
Python Packaging Authority (PyPA)
130130

131-
PyPA is a working group that maintains many of the relevant projects in
132-
Python packaging. They maintain a site at https://www.pypa.io, host projects
133-
on `github <https://github.com/pypa>`_ and `bitbucket
134-
<https://bitbucket.org/pypa>`_, and discuss issues on the `pypa-dev
135-
mailing list <https://groups.google.com/forum/#!forum/pypa-dev>`_.
131+
PyPA is a working group that maintains many of the relevant
132+
projects in Python packaging. They maintain a site at
133+
https://www.pypa.io, host projects on `GitHub
134+
<https://github.com/pypa>`_ and `Bitbucket
135+
<https://bitbucket.org/pypa>`_, and discuss issues on the
136+
`distutils-sig mailing list
137+
<https://mail.python.org/mailman3/lists/distutils-sig.python.org/>`_
138+
and `the Python Discourse forum <https://discuss.python.org/c/packaging>`__.
136139

137140

138141
Python Package Index (PyPI)
@@ -193,7 +196,7 @@ Glossary
193196
Source Archive
194197

195198
An archive containing the raw source code for a :term:`Release`, prior
196-
to creation of an :term:`Source Distribution <Source Distribution (or
199+
to creation of a :term:`Source Distribution <Source Distribution (or
197200
"sdist")>` or :term:`Built Distribution`.
198201

199202

source/guides/analyzing-pypi-package-downloads.rst

Lines changed: 128 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22
Analyzing PyPI package downloads
33
================================
44

5-
This section covers how to use the `PyPI package dataset`_ to learn more
6-
about downloads of a package (or packages) hosted on PyPI. For example, you can
7-
use it to discover the distribution of Python versions used to download a
8-
package.
5+
This section covers how to use the public PyPI download statistics dataset
6+
to learn more about downloads of a package (or packages) hosted on PyPI. For
7+
example, you can use it to discover the distribution of Python versions used to
8+
download a package.
99

1010
.. contents:: Contents
1111
:local:
@@ -14,71 +14,45 @@ package.
1414
Background
1515
==========
1616

17-
PyPI does not display download statistics because they are difficult to
18-
collect and display accurately. Reasons for this are included in the
19-
`announcement email
20-
<https://mail.python.org/pipermail/distutils-sig/2013-May/020855.html>`__:
21-
22-
There are numerous reasons for [download counts] removal/deprecation some
23-
of which are:
24-
25-
- Technically hard to make work with the new CDN
26-
27-
- The CDN is being donated to the PSF, and the donated tier does
28-
not offer any form of log access
29-
- The work around for not having log access would greatly reduce
30-
the utility of the CDN
31-
- Highly inaccurate
32-
- A number of things prevent the download counts from being
33-
accurate, some of which include:
34-
35-
- pip download cache
36-
- Internal or unofficial mirrors
37-
- Packages not hosted on PyPI (for comparisons sake)
38-
- Mirrors or unofficial grab scripts causing inflated counts
39-
(Last I looked 25% of the downloads were from a known
40-
mirroring script).
41-
- Not particularly useful
42-
43-
- Just because a project has been downloaded a lot doesn't mean
44-
it's good
45-
- Similarly just because a project hasn't been downloaded a lot
46-
doesn't mean it's bad
47-
48-
In short because it's value is low for various reasons, and the tradeoffs
49-
required to make it work are high It has been not an effective use of
50-
resources.
51-
52-
As an alternative, the `Linehaul project
53-
<https://github.com/pypa/linehaul>`__ streams download logs to `Google
54-
BigQuery`_ [#]_. Linehaul writes an entry in a
55-
``the-psf.pypi.downloadsYYYYMMDD`` table for each download. The table
56-
contains information about what file was downloaded and how it was
57-
downloaded. Some useful columns from the `table schema
58-
<https://bigquery.cloud.google.com/table/the-psf:pypi.downloads20161022?tab=schema>`__
59-
include:
17+
PyPI does not display download statistics for a number of reasons: [#]_
6018

61-
+------------------------+-----------------+-----------------------+
62-
| Column | Description | Examples |
63-
+========================+=================+=======================+
64-
| file.project | Project name | ``pipenv``, ``nose`` |
65-
+------------------------+-----------------+-----------------------+
66-
| file.version | Package version | ``0.1.6``, ``1.4.2`` |
67-
+------------------------+-----------------+-----------------------+
68-
| details.installer.name | Installer | pip, `bandersnatch`_ |
69-
+------------------------+-----------------+-----------------------+
70-
| details.python | Python version | ``2.7.12``, ``3.6.4`` |
71-
+------------------------+-----------------+-----------------------+
19+
- **Inefficient to make work with a Content Distribution Network (CDN):**
20+
Download statistics change constantly. Including them in project pages, which
21+
are heavily cached, would require invalidating the cache more often, and
22+
reduce the overall effectiveness of the cache.
7223

73-
.. [#] `PyPI BigQuery dataset announcement email <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>`__
24+
- **Highly inaccurate:** A number of things prevent the download counts from
25+
being accurate, some of which include:
7426

75-
Setting up
76-
==========
27+
- ``pip``'s download cache (lowers download counts)
28+
- Internal or unofficial mirrors (can both raise or lower download counts)
29+
- Packages not hosted on PyPI (for comparisons sake)
30+
- Unofficial scripts or attempts at download count inflation (raises download
31+
counts)
32+
- Known historical data quality issues (lowers download counts)
33+
34+
- **Not particularly useful:** Just because a project has been downloaded a lot
35+
doesn't mean it's good; Similarly just because a project hasn't been
36+
downloaded a lot doesn't mean it's bad!
7737

78-
In order to use `Google BigQuery`_ to query the `PyPI package dataset`_,
79-
you'll need a Google account and to enable the BigQuery API on a Google
80-
Cloud Platform project. You can run the up to 1TB of queries per month `using
81-
the BigQuery free tier without a credit card
38+
In short, because it's value is low for various reasons, and the tradeoffs
39+
required to make it work are high, it has been not an effective use of
40+
limited resources.
41+
42+
Public dataset
43+
==============
44+
45+
As an alternative, the `Linehaul project <https://github.com/pypa/linehaul>`__
46+
streams download logs from PyPI to `Google BigQuery`_ [#]_, where they are
47+
stored as a public dataset.
48+
49+
Getting set up
50+
--------------
51+
52+
In order to use `Google BigQuery`_ to query the `public PyPI download
53+
statistics dataset`_, you'll need a Google account and to enable the BigQuery
54+
API on a Google Cloud Platform project. You can run the up to 1TB of queries
55+
per month `using the BigQuery free tier without a credit card
8256
<https://cloud.google.com/blog/big-data/2017/01/how-to-run-a-terabyte-of-google-bigquery-queries-each-month-without-a-credit-card>`__
8357

8458
- Navigate to the `BigQuery web UI`_.
@@ -90,8 +64,31 @@ For more detailed instructions on how to get started with BigQuery, check out
9064
the `BigQuery quickstart guide
9165
<https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui>`__.
9266

67+
68+
Data schema
69+
-----------
70+
71+
Linehaul writes an entry in a ``the-psf.pypi.downloadsYYYYMMDD`` table for each
72+
download. The table contains information about what file was downloaded and how
73+
it was downloaded. Some useful columns from the `table schema
74+
<https://console.cloud.google.com/bigquery?pli=1&p=the-psf&d=pypi&t=downloads&page=table>`__
75+
include:
76+
77+
+------------------------+-----------------+-----------------------+
78+
| Column | Description | Examples |
79+
+========================+=================+=======================+
80+
| file.project | Project name | ``pipenv``, ``nose`` |
81+
+------------------------+-----------------+-----------------------+
82+
| file.version | Package version | ``0.1.6``, ``1.4.2`` |
83+
+------------------------+-----------------+-----------------------+
84+
| details.installer.name | Installer | pip, `bandersnatch`_ |
85+
+------------------------+-----------------+-----------------------+
86+
| details.python | Python version | ``2.7.12``, ``3.6.4`` |
87+
+------------------------+-----------------+-----------------------+
88+
89+
9390
Useful queries
94-
==============
91+
--------------
9592

9693
Run queries in the `BigQuery web UI`_ by clicking the "Compose query" button.
9794

@@ -102,7 +99,7 @@ recent history by using `wildcard tables
10299
select all tables and then filter by ``_TABLE_SUFFIX``.
103100

104101
Counting package downloads
105-
--------------------------
102+
~~~~~~~~~~~~~~~~~~~~~~~~~~
106103

107104
The following query counts the total number of downloads for the project
108105
"pytest".
@@ -148,7 +145,7 @@ column.
148145
+---------------+
149146

150147
Package downloads over time
151-
---------------------------
148+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
152149

153150
To group by monthly downloads, use the ``_TABLE_SUFFIX`` pseudo-column. Also
154151
use the pseudo-column to limit the tables queried and the corresponding
@@ -188,7 +185,7 @@ costs.
188185
+---------------+--------+
189186

190187
More queries
191-
------------
188+
~~~~~~~~~~~~
192189

193190
- `Data driven decisions using PyPI download statistics
194191
<https://langui.sh/2016/12/09/data-driven-decisions/>`__
@@ -198,19 +195,68 @@ More queries
198195
- `Non-Windows downloads, grouped by platform
199196
<https://bigquery.cloud.google.com/savedquery/51422494423:ff1976af63614ad4a1258d8821dd7785>`__
200197

198+
Caveats
199+
=======
200+
201+
In addition to the caveats listed in the background above, Linehaul suffered
202+
from a bug which caused it to significantly under-report download statistics
203+
prior to July 26, 2018. Downloads before this date are proportionally accurate
204+
(e.g. the percentage of Python 2 vs. Python 3 downloads) but total numbers are
205+
lower than actual by an order of magnitude.
206+
207+
201208
Additional tools
202209
================
203210

204-
You can also access the `PyPI package dataset`_ programmatically via the
205-
BigQuery API.
211+
Besides using the BigQuery console, there are some additional tools which may
212+
be useful when analyzing download statistics.
213+
214+
``google-cloud-bigquery``
215+
-------------------------
206216

207-
pypinfo
208-
-------
217+
You can also access the public PyPI download statistics dataset
218+
programmatically via the BigQuery API and the `google-cloud-bigquery`_ project,
219+
the official Python client library for BigQuery.
220+
221+
.. code-block:: python
222+
223+
from google.cloud import bigquery
224+
225+
# Note: depending on where this code is being run, you may require
226+
# additional authentication. See:
227+
# https://cloud.google.com/bigquery/docs/authentication/
228+
client = bigquery.Client()
229+
230+
query_job = client.query("""
231+
SELECT COUNT(*) AS num_downloads
232+
FROM `the-psf.pypi.downloads*`
233+
WHERE file.project = 'pytest'
234+
-- Only query the last 30 days of history
235+
AND _TABLE_SUFFIX
236+
BETWEEN FORMAT_DATE(
237+
'%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
238+
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())""")
239+
240+
results = query_job.result() # Waits for job to complete.
241+
for row in results:
242+
print("{} downloads".format(row.num_downloads))
243+
244+
245+
``pypinfo``
246+
-----------
209247

210248
`pypinfo`_ is a command-line tool which provides access to the dataset and
211249
can generate several useful queries. For example, you can query the total
212250
number of download for a package with the command ``pypinfo package_name``.
213251

252+
Install `pypinfo`_ using pip.
253+
254+
::
255+
256+
pip install pypinfo
257+
258+
Usage:
259+
214260
::
215261

216262
$ pypinfo requests
@@ -223,20 +269,20 @@ number of download for a package with the command ``pypinfo package_name``.
223269
| -------------- |
224270
| 9,316,415 |
225271

226-
Install `pypinfo`_ using pip.
227272

228-
::
273+
``pandas-gbq``
274+
--------------
275+
276+
The `pandas-gbq`_ project allows for accessing query results via `Pandas`_.
229277

230-
pip install pypinfo
231278

232-
Other libraries
233-
---------------
279+
References
280+
==========
234281

235-
- `google-cloud-bigquery`_ is the official client library to access the
236-
BigQuery API.
237-
- `pandas-gbq`_ allows for accessing query results via `Pandas`_.
282+
.. [#] `PyPI Download Counts deprecation email <https://mail.python.org/pipermail/distutils-sig/2013-May/020855.html>`__
283+
.. [#] `PyPI BigQuery dataset announcement email <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>`__
238284
239-
.. _PyPI package dataset: https://bigquery.cloud.google.com/dataset/the-psf:pypi
285+
.. _public PyPI download statistics dataset: https://console.cloud.google.com/bigquery?p=the-psf&d=pypi&page=dataset
240286
.. _bandersnatch: /key_projects/#bandersnatch
241287
.. _Google BigQuery: https://cloud.google.com/bigquery
242288
.. _BigQuery web UI: https://console.cloud.google.com/bigquery

0 commit comments

Comments
 (0)