pypa
diff --git a/‎.github/CODEOWNERS‎
Lines changed: 2 additions & 0 deletions b/‎.github/CODEOWNERS‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎noxfile.py‎
Lines changed: 3 additions & 1 deletion b/‎noxfile.py‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎source/contribute.rst‎
Lines changed: 2 additions & 4 deletions b/‎source/contribute.rst‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎source/discussions/deploying-python-applications.rst‎
Lines changed: 4 additions & 5 deletions b/‎source/discussions/deploying-python-applications.rst‎
Lines changed: 4 additions & 5 deletions
diff --git a/‎source/glossary.rst‎
Lines changed: 11 additions & 8 deletions b/‎source/glossary.rst‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎source/guides/analyzing-pypi-package-downloads.rst‎
Lines changed: 128 additions & 82 deletions b/‎source/guides/analyzing-pypi-package-downloads.rst‎
Lines changed: 128 additions & 82 deletions
@@ -0,0 +1,2 @@
+source/guides/github-actions-ci-cd-sample/* @webknjaz
+source/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows.rst @webknjaz
@@ -17,10 +17,12 @@ def build(session, autobuild=False):
 
     if autobuild:
         command = "sphinx-autobuild"
+        extra_args = "-H", "0.0.0.0"
     else:
         command = "sphinx-build"
+        extra_args = ()
 
-    session.run(command, "-W", "-b", "html", "source", "build")
+    session.run(command, *extra_args, "-W", "-b", "html", "source", "build")
 
 
 @nox.session(py="3")
 
@@ -126,10 +126,8 @@ contributions to be accepted into the project.
 Purpose
 -------
 
-The purpose of the |PyPUG| is
-
-    to be the authoritative resource on how to package, publish, and install
-    Python projects using current tools.
+The purpose of the |PyPUG| is to be the authoritative resource on how to
+package, publish, and install Python projects using current tools.
 
 
 Scope
 
@@ -67,12 +67,11 @@ the Python-interpreter and declare the dependencies of the program. The tool
 downloads the specified Python-interpreter for Windows and packages it with all
 the dependencies in a single Windows-executable installer.
 
-The installer installs or updates the Python-interpreter on the users system,
-which can be used independently of the packaged program. The program itself,
-can be started from a shortcut, that the installer places in the start-menu.
-Uninstalling the program leaves the Python installation of the user intact.
+The installed program can be started from a shortcut that the installer adds to
+the start-menu. It uses a Python interpreter installed within its application
+directory, independent of any other Python installation on the computer.
 
-A big advantage of pynsist is that the Windows packages can be built on Linux.
+A big advantage of Pynsist is that the Windows packages can be built on Linux.
 There are several examples for different kinds of programs (console, GUI) in
 the `documentation <https://pynsist.readthedocs.io>`__. The tool is released
 under the MIT-licence.
 
@@ -122,17 +122,20 @@ Glossary
 
     Pure Module
 
-        A :term:`module` written in Python and contained in a single .py file (and
-        possibly associated .pyc and/or .pyo files).
+        A :term:`module` written in Python and contained in a single `.py` file (and
+        possibly associated `.pyc` and/or `.pyo` files).
 
 
     Python Packaging Authority (PyPA)
 
-        PyPA is a working group that maintains many of the relevant projects in
-        Python packaging. They maintain a site at https://www.pypa.io, host projects
-        on `github <https://github.com/pypa>`_ and `bitbucket
-        <https://bitbucket.org/pypa>`_, and discuss issues on the `pypa-dev
-        mailing list <https://groups.google.com/forum/#!forum/pypa-dev>`_.
+        PyPA is a working group that maintains many of the relevant
+        projects in Python packaging. They maintain a site at
+        https://www.pypa.io, host projects on `GitHub
+        <https://github.com/pypa>`_ and `Bitbucket
+        <https://bitbucket.org/pypa>`_, and discuss issues on the
+        `distutils-sig mailing list
+        <https://mail.python.org/mailman3/lists/distutils-sig.python.org/>`_
+	and `the Python Discourse forum <https://discuss.python.org/c/packaging>`__.
 
 
     Python Package Index (PyPI)
@@ -193,7 +196,7 @@ Glossary
     Source Archive
 
         An archive containing the raw source code for a :term:`Release`, prior
-        to creation of an :term:`Source Distribution <Source Distribution (or
+        to creation of a :term:`Source Distribution <Source Distribution (or
         "sdist")>` or :term:`Built Distribution`.
 
 
 
@@ -2,10 +2,10 @@
 Analyzing PyPI package downloads
 ================================
 
-This section covers how to use the `PyPI package dataset`_ to learn more
-about downloads of a package (or packages) hosted on PyPI. For example, you can
-use it to discover the distribution of Python versions used to download a
-package.
+This section covers how to use the public PyPI download statistics dataset
+to learn more about downloads of a package (or packages) hosted on PyPI. For
+example, you can use it to discover the distribution of Python versions used to
+download a package.
 
 .. contents:: Contents
    :local:
@@ -14,71 +14,45 @@ package.
 Background
 ==========
 
-PyPI does not display download statistics because they are difficult to
-collect and display accurately. Reasons for this are included in the
-`announcement email
-<https://mail.python.org/pipermail/distutils-sig/2013-May/020855.html>`__:
-
-    There are numerous reasons for [download counts] removal/deprecation some
-    of which are:
-
-        - Technically hard to make work with the new CDN
-
-            - The CDN is being donated to the PSF, and the donated tier does
-              not offer any form of log access
-            - The work around for not having log access would greatly reduce
-              the utility of the CDN
-        - Highly inaccurate
-            - A number of things prevent the download counts from being
-              accurate, some of which include:
-
-                - pip download cache
-                - Internal or unofficial mirrors
-                - Packages not hosted on PyPI (for comparisons sake)
-                - Mirrors or unofficial grab scripts causing inflated counts
-                  (Last I looked 25% of the downloads were from a known
-                  mirroring script).
-        - Not particularly useful
-
-            - Just because a project has been downloaded a lot doesn't mean
-              it's good
-            - Similarly just because a project hasn't been downloaded a lot
-              doesn't mean it's bad
-
-    In short because it's value is low for various reasons, and the tradeoffs
-    required to make it work are high It has been not an effective use of
-    resources.
-
-As an alternative, the `Linehaul project
-<https://github.com/pypa/linehaul>`__ streams download logs to `Google
-BigQuery`_ [#]_. Linehaul writes an entry in a
-``the-psf.pypi.downloadsYYYYMMDD`` table for each download. The table
-contains information about what file was downloaded and how it was
-downloaded. Some useful columns from the `table schema
-<https://bigquery.cloud.google.com/table/the-psf:pypi.downloads20161022?tab=schema>`__
-include:
+PyPI does not display download statistics for a number of reasons: [#]_
 
-+------------------------+-----------------+-----------------------+
-| Column                 | Description     | Examples              |
-+========================+=================+=======================+
-| file.project           | Project name    | ``pipenv``, ``nose``  |
-+------------------------+-----------------+-----------------------+
-| file.version           | Package version | ``0.1.6``, ``1.4.2``  |
-+------------------------+-----------------+-----------------------+
-| details.installer.name | Installer       | pip, `bandersnatch`_  |
-+------------------------+-----------------+-----------------------+
-| details.python         | Python version  | ``2.7.12``, ``3.6.4`` |
-+------------------------+-----------------+-----------------------+
+- **Inefficient to make work with a Content Distribution Network (CDN):**
+  Download statistics change constantly. Including them in project pages, which
+  are heavily cached, would require invalidating the cache more often, and
+  reduce the overall effectiveness of the cache.
 
-.. [#] `PyPI BigQuery dataset announcement email <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>`__
+- **Highly inaccurate:** A number of things prevent the download counts from
+  being accurate, some of which include:
 
-Setting up
-==========
+  - ``pip``'s download cache (lowers download counts)
+  - Internal or unofficial mirrors (can both raise or lower download counts)
+  - Packages not hosted on PyPI (for comparisons sake)
+  - Unofficial scripts or attempts at download count inflation (raises download
+    counts)
+  - Known historical data quality issues (lowers download counts)
+
+- **Not particularly useful:** Just because a project has been downloaded a lot
+  doesn't mean it's good; Similarly just because a project hasn't been
+  downloaded a lot doesn't mean it's bad!
 
-In order to use `Google BigQuery`_ to query the `PyPI package dataset`_,
-you'll need a Google account and to enable the BigQuery API on a Google
-Cloud Platform project. You can run the up to 1TB of queries per month `using
-the BigQuery free tier without a credit card
+In short, because it's value is low for various reasons, and the tradeoffs
+required to make it work are high, it has been not an effective use of
+limited resources.
+
+Public dataset
+==============
+
+As an alternative, the `Linehaul project <https://github.com/pypa/linehaul>`__
+streams download logs from PyPI to `Google BigQuery`_ [#]_, where they are
+stored as a public dataset.
+
+Getting set up
+--------------
+
+In order to use `Google BigQuery`_ to query the `public PyPI download
+statistics dataset`_, you'll need a Google account and to enable the BigQuery
+API on a Google Cloud Platform project. You can run the up to 1TB of queries
+per month `using the BigQuery free tier without a credit card
 <https://cloud.google.com/blog/big-data/2017/01/how-to-run-a-terabyte-of-google-bigquery-queries-each-month-without-a-credit-card>`__
 
 - Navigate to the `BigQuery web UI`_.
@@ -90,8 +64,31 @@ For more detailed instructions on how to get started with BigQuery, check out
 the `BigQuery quickstart guide
 <https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui>`__.
 
+
+Data schema
+-----------
+
+Linehaul writes an entry in a ``the-psf.pypi.downloadsYYYYMMDD`` table for each
+download. The table contains information about what file was downloaded and how
+it was downloaded. Some useful columns from the `table schema
+<https://console.cloud.google.com/bigquery?pli=1&p=the-psf&d=pypi&t=downloads&page=table>`__
+include:
+
++------------------------+-----------------+-----------------------+
+| Column                 | Description     | Examples              |
++========================+=================+=======================+
+| file.project           | Project name    | ``pipenv``, ``nose``  |
++------------------------+-----------------+-----------------------+
+| file.version           | Package version | ``0.1.6``, ``1.4.2``  |
++------------------------+-----------------+-----------------------+
+| details.installer.name | Installer       | pip, `bandersnatch`_  |
++------------------------+-----------------+-----------------------+
+| details.python         | Python version  | ``2.7.12``, ``3.6.4`` |
++------------------------+-----------------+-----------------------+
+
+
 Useful queries
-==============
+--------------
 
 Run queries in the `BigQuery web UI`_ by clicking the "Compose query" button.
 
@@ -102,7 +99,7 @@ recent history by using `wildcard tables
 select all tables and then filter by ``_TABLE_SUFFIX``.
 
 Counting package downloads
---------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The following query counts the total number of downloads for the project
 "pytest".
@@ -148,7 +145,7 @@ column.
 +---------------+
 
 Package downloads over time
----------------------------
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 To group by monthly downloads, use the ``_TABLE_SUFFIX`` pseudo-column. Also
 use the pseudo-column to limit the tables queried and the corresponding
@@ -188,7 +185,7 @@ costs.
 +---------------+--------+
 
 More queries
-------------
+~~~~~~~~~~~~
 
 - `Data driven decisions using PyPI download statistics
   <https://langui.sh/2016/12/09/data-driven-decisions/>`__
@@ -198,19 +195,68 @@ More queries
 - `Non-Windows downloads, grouped by platform
   <https://bigquery.cloud.google.com/savedquery/51422494423:ff1976af63614ad4a1258d8821dd7785>`__
 
+Caveats
+=======
+
+In addition to the caveats listed in the background above, Linehaul suffered
+from a bug which caused it to significantly under-report download statistics
+prior to July 26, 2018. Downloads before this date are proportionally accurate
+(e.g. the percentage of Python 2 vs. Python 3 downloads) but total numbers are
+lower than actual by an order of magnitude.
+
+
 Additional tools
 ================
 
-You can also access the `PyPI package dataset`_ programmatically via the
-BigQuery API.
+Besides using the BigQuery console, there are some additional tools which may
+be useful when analyzing download statistics.
+
+``google-cloud-bigquery``
+-------------------------
 
-pypinfo
--------
+You can also access the public PyPI download statistics dataset
+programmatically via the BigQuery API and the `google-cloud-bigquery`_ project,
+the official Python client library for BigQuery.
+
+.. code-block:: python
+
+    from google.cloud import bigquery
+
+    # Note: depending on where this code is being run, you may require
+    # additional authentication. See:
+    # https://cloud.google.com/bigquery/docs/authentication/
+    client = bigquery.Client()
+
+    query_job = client.query("""
+    SELECT COUNT(*) AS num_downloads
+    FROM `the-psf.pypi.downloads*`
+    WHERE file.project = 'pytest'
+    -- Only query the last 30 days of history
+    AND _TABLE_SUFFIX
+        BETWEEN FORMAT_DATE(
+            '%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
+        AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())""")
+
+    results = query_job.result()  # Waits for job to complete.
+    for row in results:
+        print("{} downloads".format(row.num_downloads))
+
+
+``pypinfo``
+-----------
 
 `pypinfo`_ is a command-line tool which provides access to the dataset and
 can generate several useful queries. For example, you can query the total
 number of download for a package with the command ``pypinfo package_name``.
 
+Install `pypinfo`_ using pip.
+
+::
+
+    pip install pypinfo
+
+Usage:
+
 ::
 
     $ pypinfo requests
@@ -223,20 +269,20 @@ number of download for a package with the command ``pypinfo package_name``.
     | -------------- |
     |      9,316,415 |
 
-Install `pypinfo`_ using pip.
 
-::
+``pandas-gbq``
+--------------
+
+The `pandas-gbq`_ project allows for accessing query results via `Pandas`_.
 
-    pip install pypinfo
 
-Other libraries
----------------
+References
+==========
 
-- `google-cloud-bigquery`_ is the official client library to access the
-  BigQuery API.
-- `pandas-gbq`_ allows for accessing query results via `Pandas`_.
+.. [#] `PyPI Download Counts deprecation email <https://mail.python.org/pipermail/distutils-sig/2013-May/020855.html>`__
+.. [#] `PyPI BigQuery dataset announcement email <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>`__
 
-.. _PyPI package dataset: https://bigquery.cloud.google.com/dataset/the-psf:pypi
+.. _public PyPI download statistics dataset: https://console.cloud.google.com/bigquery?p=the-psf&d=pypi&page=dataset
 .. _bandersnatch: /key_projects/#bandersnatch
 .. _Google BigQuery: https://cloud.google.com/bigquery
 .. _BigQuery web UI: https://console.cloud.google.com/bigquery
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+source/guides/github-actions-ci-cd-sample/* @webknjaz`
	`2`	`+source/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows.rst @webknjaz`