Skip to content

Commit 00f39e5

Browse files
committed
Reorganize into why/how sections, and add emphasis
Hopefully, this will make it easier for people who aren't interested in reading all the rationale to find the important details -- and for people who read it once to quickly find the point they need later.
1 parent 9f9af53 commit 00f39e5

1 file changed

Lines changed: 117 additions & 95 deletions

File tree

source/discussions/downstream-packaging.rst

Lines changed: 117 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -29,97 +29,108 @@ such as Gentoo Linux.
2929

3030
Provide complete source distributions
3131
-------------------------------------
32+
Why?
33+
~~~~
3234
The vast majority of downstream packagers prefer to build packages from source,
3335
rather than use the upstream-provided binary packages. This is also true
3436
of pure Python packages that provide universal wheels. The reasons for using
3537
source distributions may include:
3638

37-
- being able to audit the source code of all packages
39+
- being able to **audit the source code** of all packages
3840

39-
- being able to run the test suite and build documentation
41+
- being able to **run the test suite and build documentation**
4042

41-
- being able to easily apply patches, including backporting commits from your
42-
repository and sending patches back to you
43+
- being able to **easily apply patches**, including backporting commits
44+
from your repository and sending patches back to you
4345

44-
- being able to build against a specific platform that is not covered
46+
- being able to **build on a specific platform** that is not covered
4547
by upstream builds
4648

47-
- being able to build against specific versions of system libraries
49+
- being able to **build against specific versions of system libraries**
4850

4951
- having a consistent build process across all Python packages
5052

51-
Ideally, a source distribution archive should include all the files necessary
52-
to build the package itself, run its test suite, build and install its
53-
documentation, and any other files that may be useful to end users, such
54-
as shell completions, editor support files, and so on.
55-
56-
Some projects are concerned about increasing the size of source distribution,
57-
or do not wish Python packaging tools to fall back to source distributions
58-
automatically. In these cases, a good compromise may be to publish a separate
59-
source archive for downstream use, for example by attaching it to a GitHub
60-
release. Alternatively, large files, such as test data, can be split into
61-
separate archives.
62-
6353
While it is usually possible to build packages from a git repository, there are
6454
a few important reasons to provide a static archive file instead:
6555

66-
- Fetching a single file is often more efficient, more reliable and better
67-
supported than e.g. using a git clone. This can help users with a shoddy
56+
- Fetching a single file is often **more efficient, more reliable and better
57+
supported** than e.g. using a git clone. This can help users with a shoddy
6858
Internet connection.
6959

70-
- Downstreams often use checksums to verify the authenticity of source files
60+
- Downstreams often **use checksums to verify the authenticity** of source files
7161
on subsequent builds, which require that they remain bitwise identical over
7262
time. For example, automatically generated git archives do not guarantee
7363
that.
7464

75-
- Archive files can be mirrored, reducing both upstream and downstream
65+
- Archive files can be **mirrored**, reducing both upstream and downstream
7666
bandwidth use. The actual builds can afterwards be performed in firewalled
7767
or offline environments, that can only access source files provided
7868
by the local mirror or redistributed earlier.
7969

80-
A good idea is to use a release workflow that starts by building a source
81-
distribution, and then performs all the remaining release steps (such as
82-
running tests and building wheels) from the unpacked source distribution. This
83-
ensures that the source distribution is actually tested, and reduces the risk
84-
that users installing from it will hit build failures or install an incomplete
85-
package.
70+
How?
71+
~~~~
72+
Ideally, **a source distribution archive should include all the files**
73+
necessary to build the package itself, run its test suite, build and install
74+
its documentation, and any other files that may be useful to end users, such as
75+
shell completions, editor support files, and so on.
76+
77+
Some projects are concerned about increasing the size of source distribution,
78+
or do not wish Python packaging tools to fall back to source distributions
79+
automatically. In these cases, a good compromise may be to publish a separate
80+
source archive for downstream use, for example by attaching it to a GitHub
81+
release. Alternatively, large files, such as test data, can be split into
82+
separate archives.
83+
84+
A good idea is to **use your source distribution in the release workflow**.
85+
That is, build it first, then unpack it and perform all the remaining steps
86+
using the unpacked distribution rather than the git repostiry — run tests,
87+
build documentation, build wheels. This ensures that it is well-tested,
88+
and reduces the risk that some users would hit build failures or install
89+
an incomplete package.
8690

8791

8892
.. _Do not use the Internet during the build process:
8993

9094
Do not use the Internet during the build process
9195
------------------------------------------------
92-
Downstream builds are frequently done in sandboxed environments that cannot
93-
access the Internet. Therefore, it is important that your source distribution
94-
includes all the files needed for the package to build or allows provisioning
95-
them externally, and can build successfully without Internet access.
96-
97-
Ideally, it should not even attempt to access the Internet at all, unless
98-
explicitly requested to. If that is not possible to achieve, the next best
99-
thing is to provide an opt-out switch to disable all Internet access, and fail
100-
if some of the required files are missing instead of trying to fetch them. This
101-
could be done e.g. by checking whether a ``NO_NETWORK`` environment variable is
102-
to a non-empty value. Please also remember that if you are fetching remote
103-
resources, you should verify their authenticity, e.g. against a checksum, to
104-
protect against the file being substituted by a malicious party.
105-
106-
Even if downloads are properly authenticated, using the Internet is discouraged
107-
for a number of reasons:
96+
Why?
97+
~~~~
98+
Downstream builds are frequently done in sandboxed environments that **cannot
99+
access the Internet**. Even if this is not the case, and assuming that you took
100+
sufficient care to **properly authenticate downloads**, using the Internet
101+
is discouraged for a number of reasons:
108102

109-
- The Internet connection may be unstable (e.g. poor reception) or suffer from
110-
temporary problems that could cause the downloads to fail or hang.
103+
- The Internet **connection may be unstable** (e.g. due to poor reception)
104+
or suffer from temporary problems that could cause the process to fail
105+
or hang.
111106

112-
- The remote resources may become temporarily or even permanently unavailable,
113-
making the build no longer possible. This is especially problematic when
114-
someone needs to build an old package version.
107+
- The remote resources may **become temporarily or even permanently
108+
unavailable**, making the build no longer possible. This is especially
109+
problematic when someone needs to build an old package version.
115110

116-
- Accessing remote servers poses a privacy issue and a potential security issue,
117-
as it exposes information about the system building the package.
111+
- Accessing remote servers poses a **privacy** issue and a potential
112+
**security** issue, as it exposes information about the system building
113+
the package.
118114

119115
- The user may be using a service with a limited data plan, in which
120-
uncontrolled Internet access may result in additional charges or other
116+
uncontrolled Internet access may result in **additional charges** or other
121117
inconveniences.
122118

119+
How?
120+
~~~~
121+
Your source distribution should either **include all the files needed
122+
for the package to build**, or allow provisioning them externally. Ideally,
123+
it should not even attempt to access the Internet at all, unless explicitly
124+
requested to. If that is not possible to achieve, the next best thing
125+
is to **provide an opt-out switch to disable all Internet access**.
126+
127+
When such a switch is used, the build process should fail if some
128+
of the required files are missing, rather than try to fetch them automatically.
129+
This could be done e.g. by checking whether a ``NO_NETWORK`` environment
130+
variable is set to a non-empty value. Please also remember that if you are
131+
fetching remote resources, you must **verify their authenticity**, e.g. against
132+
a checksum, to protect against the file being substituted by a malicious party.
133+
123134
Since downstreams frequently also run tests and build documentation, the above
124135
should ideally extend to these processes as well.
125136

@@ -128,107 +139,118 @@ should ideally extend to these processes as well.
128139

129140
Support building against system dependencies
130141
--------------------------------------------
142+
Why?
143+
~~~~
131144
Some Python projects have non-Python dependencies, such as libraries written
132145
in C or C++. Trying to use the system versions of these dependencies
133146
in upstream packaging may cause a number of problems for end users:
134147

135-
- The published wheels require a binary-compatible version of the used library
136-
to be present on the user's system. If the library is missing or installed
137-
in incompatible version, the Python package may fail with errors that
138-
are not clear to inexperienced users, or even misbehave at runtime.
148+
- The published wheels **require a binary-compatible version of the used
149+
library** to be present on the user's system. If the library is missing
150+
or installed in incompatible version, the Python package may fail with errors
151+
that are not clear to inexperienced users, or even misbehave at runtime.
139152

140-
- Building from source distribution requires a source-compatible version
141-
of the dependency to be present, along with its development headers and other
142-
auxiliary files that some systems package separately from the library itself.
153+
- Building from source distribution **requires a source-compatible version
154+
of the dependency** to be present, along with its development headers
155+
and other auxiliary files that some systems package separately
156+
from the library itself.
143157

144158
- Even for an experienced user, installing a compatible dependency version
145159
may be very hard. For example, the used Linux distribution may not provide
146-
the required version, or some other package may require an incompatible
147-
version.
160+
the required version, or some **other package may require an incompatible
161+
version**.
148162

149163
- The linkage between the Python package and its system dependency is not
150-
recorded by the packaging system. The next system update may upgrade
151-
the library to a newer version that breaks binary compatibility with
164+
recorded by the packaging system. The next system update may **upgrade
165+
the library to a newer version that breaks binary compatibility** with
152166
the Python package, and requires user intervention to fix.
153167

154-
For these reasons, you may reasonable to decide to either link statically
168+
For these reasons, you may reasonable to decide to either **link statically**
155169
to your dependencies, or to provide a local copies in the installed package.
156-
You may also vendor the dependency in your source distribution. Sometimes
170+
You may also **vendor the dependency** in your source distribution. Sometimes
157171
these dependencies are also repackaged on PyPI, and can be installed
158172
like a regular Python packages.
159173

160174
However, none of these issues apply to downstream packaging, and downstreams
161-
have good reasons to prefer dynamically linking to system dependencies.
175+
have good reasons to prefer **dynamically linking to system dependencies**.
162176
In particular:
163177

164178
- Static linking and vendoring obscures the use of external dependencies,
165-
making source auditing harder.
179+
**making source auditing harder**.
166180

167-
- Dynamic linking makes it possible to easily and quickly replace the used
168-
libraries, which can be particularly important when they turn out to
181+
- Dynamic linking makes it possible to easily and **quickly replace the used
182+
libraries**, which can be particularly important when they turn out to
169183
be vulnerable or buggy.
170184

171-
- Using system dependencies makes the package benefit from downstream
172-
customization that can improve the user experience on a particular platform,
185+
- Using system dependencies makes the package benefit from **downstream
186+
customization** that can improve the user experience on a particular platform,
173187
without the downstream maintainers having to consistently patch
174188
the dependencies vendored in different packages. This can include
175-
compatibility improvements and security hardening.
189+
**compatibility improvements and security hardening**.
176190

177-
- Static linking and vendoring could result in multiple different versions
178-
of the same library being loaded in the same process (e.g. when you use two
191+
- Static linking and vendoring could result in **multiple different versions
192+
of the same library being loaded in the same process** (e.g. when you use two
179193
Python packages that link to different versions of the same library).
180194
This can cause no problems, but it could also lead to anything from subtle
181195
bugs to catastrophic failures.
182196

183197
- Last but not least, static linking and vendoring results in duplication,
184-
and may increase the use of both the disk space and memory.
198+
and may increase the **use of both the disk space and memory**.
185199

186-
A good compromise between the needs of both parties is to provide a switch
187-
between using vendored and system dependencies. Ideally, if the package has
200+
How?
201+
~~~~
202+
A good compromise between the needs of both parties is to **provide a switch
203+
between using vendored and system dependencies**. Ideally, if the package has
188204
multiple vendored dependencies, it should provide both individual switches
189-
for each dependency, and a general switch, for example using
190-
a ``USE_SYSTEM_DEPS`` environment variable to control the default. If switched
191-
on, and a particular dependency is either missing or incompatible, the build
192-
should fail with an explanatory message, giving the packager an explicit
193-
indication of the problem and a chance to consciously decide on the preferred
194-
course of action.
205+
for each dependency, and a general switch to control the default for them,
206+
e.g. via a ``USE_SYSTEM_DEPS`` environment variable.
207+
208+
If the user requests using system dependencies, and **a particular dependency
209+
is either missing or incompatible, the build should fail** with an explanatory
210+
message rather than fall back to a vendored version. This gives the packager
211+
the opportunity to notice their mistake and a chance to consciously decide
212+
how to solve it.
195213

196214

197215
.. _Support downstream testing:
198216

199217
Support downstream testing
200218
--------------------------
219+
Why?
220+
~~~~
201221
A variety of downstream projects run some degree of testing on the packaged
202222
Python projects. Depending on the particular case, this can range from minimal
203223
smoke testing to comprehensive runs of the complete test suite. There can
204224
be various reasons for doing this, for example:
205225

206-
- Verifying that the downstream packaging did not introduce any bugs.
226+
- Verifying that the downstream **packaging did not introduce any bugs**.
207227

208-
- Testing on a platform that is not covered by upstream testing.
228+
- Testing on **additional platforms** that are not covered by upstream testing.
209229

210-
- Finding subtle bugs that can only be reproduced on a particular hardware,
211-
system package versions, and so on.
230+
- Finding subtle bugs that can only be reproduced on a **particular hardware,
231+
system package versions**, and so on.
212232

213-
- Testing the released package against newer dependency version than the ones
214-
present during upstream release testing.
233+
- Testing the released package against **newer dependency versions** than
234+
the ones present during upstream release testing.
215235

216-
- Testing the package in an environment closely resembling the production
217-
setup. This can detect issues caused by nontrivial interactions between
236+
- Testing the package in an environment closely resembling **the production
237+
setup**. This can detect issues caused by nontrivial interactions between
218238
different installed packages, including packages that are not dependencies
219239
of your package, but nevertheless can cause issues.
220240

221-
- Testing the released package against newer Python versions (including newer
222-
point releases), or less tested Python implementations such as PyPy.
241+
- Testing the released package against **newer Python versions** (including
242+
newer point releases), or less tested Python implementations such as PyPy.
223243

224244
Admittedly, sometimes downstream testing may yield false positives or
225245
inconvenience you about scenarios that you are not interested in supporting.
226246
However, perhaps even more often it does provide early notice of problems,
227247
or find nontrivial bugs that would otherwise cause issues for your users
228-
in production. And believe me, the majority of downstream packagers are doing
248+
in production. And believe me, the majority of **downstream packagers are doing
229249
their best to double-check their results, and help you triage and fix the bugs
230-
that they report.
250+
that they report**.
231251

252+
How?
253+
~~~~
232254
There is a number of things that you can do to help us test your package
233255
better. Some of them were already mentioned in this discussion. Some examples
234256
are:

0 commit comments

Comments
 (0)