Skip to content

Commit 7f438b0

Browse files
sbidoulpradyunsgabravalheri
authored
Extract the Direct URL data structure into a standalone document (#1200)
* Standalone spec of the direct URL data structure * Minor edits of the direct url data structure specification Minor edits so the text reads better as a standalone document. * Fix typo in direct-url-data-structure.rst Co-authored-by: Anderson Bravalheri <andersonbravalheri+github@gmail.com> --------- Co-authored-by: Pradyun Gedam <pradyunsg@gmail.com> Co-authored-by: Anderson Bravalheri <andersonbravalheri+github@gmail.com>
1 parent 126d2cd commit 7f438b0

3 files changed

Lines changed: 294 additions & 255 deletions

File tree

Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
2+
.. _direct-url-data-structure:
3+
4+
=========================
5+
Direct URL Data Structure
6+
=========================
7+
8+
This document specifies a JSON-serializable abstract data structure that can represent
9+
URLs to python projects and distribution artifacts such as VCS source trees, local
10+
source trees, source distributions and wheels.
11+
12+
The representation of the components of this data structure as a :rfc:`1738` URL
13+
is not formally specified at time of writing. A common representation is the pip URL
14+
format. Other examples are provided in :pep:`440`.
15+
16+
.. contents:: Contents
17+
:local:
18+
19+
Specification
20+
=============
21+
22+
The Direct URL Data Structure MUST be a dictionary, serializable to JSON according to
23+
:rfc:`8259`.
24+
25+
It MUST contain at least two fields. The first one is ``url``, with
26+
type ``string``. Depending on what ``url`` refers to, the second field MUST be
27+
one of ``vcs_info`` (if ``url`` is a VCS reference), ``archive_info`` (if
28+
``url`` is a source archives or a wheel), or ``dir_info`` (if ``url`` is a
29+
local directory). These info fields have a (possibly empty) subdictionary as
30+
value, with the possible keys defined below.
31+
32+
When persisted, ``url`` MUST be stripped of any sensitive authentication information,
33+
for security reasons.
34+
35+
The user:password section of the URL MAY however
36+
be composed of environment variables, matching the following regular
37+
expression::
38+
39+
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?
40+
41+
Additionally, the user:password section of the URL MAY be a
42+
well-known, non security sensitive string. A typical example is ``git``
43+
in the case of an URL such as ``ssh://git@gitlab.com/user/repo``.
44+
45+
VCS URLs
46+
--------
47+
48+
When ``url`` refers to a VCS repository, the ``vcs_info`` key MUST be present
49+
as a dictionary with the following keys:
50+
51+
- A ``vcs`` key (type ``string``) MUST be present, containing the name of the VCS
52+
(i.e. one of ``git``, ``hg``, ``bzr``, ``svn``). Other VCS's SHOULD be registered by
53+
writing a PEP to amend this specification.
54+
The ``url`` value MUST be compatible with the corresponding VCS,
55+
so an installer can hand it off without transformation to a
56+
checkout/download command of the VCS.
57+
- A ``requested_revision`` key (type ``string``) MAY be present naming a
58+
branch/tag/ref/commit/revision/etc (in a format compatible with the VCS).
59+
- A ``commit_id`` key (type ``string``) MUST be present, containing the
60+
exact commit/revision number that was/is to be installed.
61+
If the VCS supports commit-hash
62+
based revision identifiers, such commit-hash MUST be used as
63+
``commit_id`` in order to reference an immutable
64+
version of the source code.
65+
66+
Archive URLs
67+
------------
68+
69+
When ``url`` refers to a source archive or a wheel, the ``archive_info`` key
70+
MUST be present as a dictionary with the following keys:
71+
72+
- A ``hashes`` key SHOULD be present as a dictionary mapping a hash name to a hex
73+
encoded digest of the file.
74+
75+
Multiple hashes can be included, and it is up to the consumer to decide what to do
76+
with multiple hashes (it may validate all of them or a subset of them, or nothing at
77+
all).
78+
79+
These hash names SHOULD always be normalized to be lowercase.
80+
81+
Any hash algorithm available via ``hashlib`` (specifically any that can be passed to
82+
``hashlib.new()`` and do not require additional parameters) can be used as a key for
83+
the hashes dictionary. At least one secure algorithm from
84+
``hashlib.algorithms_guaranteed`` SHOULD always be included. At time of writing,
85+
``sha256`` specifically is recommended.
86+
87+
- A deprecated ``hash`` key (type ``string``) MAY be present for backwards compatibility
88+
purposes, with value ``<hash-algorithm>=<expected-hash>``.
89+
90+
Producers of the data structure SHOULD emit the ``hashes`` key whether one or multiple
91+
hashes are available. Producers SHOULD continue to emit the ``hash`` key in contexts
92+
where they did so before, so as to keep backwards compatibility for existing clients.
93+
94+
When both the ``hash`` and ``hashes`` keys are present, the hash represented in the
95+
``hash`` key MUST also be present in the ``hashes`` dictionary, so consumers can
96+
consider the ``hashes`` key only if it is present, and fall back to ``hash`` otherwise.
97+
98+
Local directories
99+
-----------------
100+
101+
When ``url`` refers to a local directory, the ``dir_info`` key MUST be
102+
present as a dictionary with the following key:
103+
104+
- ``editable`` (type: ``boolean``): ``true`` if the distribution was/is to be installed
105+
in editable mode, ``false`` otherwise. If absent, default to ``false``.
106+
107+
When ``url`` refers to a local directory, it MUST have the ``file`` scheme and
108+
be compliant with :rfc:`8089`. In
109+
particular, the path component must be absolute. Symbolic links SHOULD be
110+
preserved when making relative paths absolute.
111+
112+
Projects in subdirectories
113+
--------------------------
114+
115+
A top-level ``subdirectory`` field MAY be present containing a directory path,
116+
relative to the root of the VCS repository, source archive or local directory,
117+
to specify where ``pyproject.toml`` or ``setup.py`` is located.
118+
119+
Registered VCS
120+
==============
121+
122+
This section lists the registered VCS's; expanded, VCS-specific information
123+
on how to use the ``vcs``, ``requested_revision``, and other fields of
124+
``vcs_info``; and in
125+
some cases additional VCS-specific fields.
126+
Tools MAY support other VCS's although it is RECOMMENDED to register
127+
them by writing a PEP to amend this specification. The ``vcs`` field SHOULD be the command name
128+
(lowercased). Additional fields that would be necessary to
129+
support such VCS SHOULD be prefixed with the VCS command name.
130+
131+
Git
132+
---
133+
134+
Home page
135+
136+
https://git-scm.com/
137+
138+
vcs command
139+
140+
git
141+
142+
``vcs`` field
143+
144+
git
145+
146+
``requested_revision`` field
147+
148+
A tag name, branch name, Git ref, commit hash, shortened commit hash,
149+
or other commit-ish.
150+
151+
``commit_id`` field
152+
153+
A commit hash (40 hexadecimal characters sha1).
154+
155+
.. note::
156+
157+
Tools can use the ``git show-ref`` and ``git symbolic-ref`` commands
158+
to determine if the ``requested_revision`` corresponds to a Git ref.
159+
In turn, a ref beginning with ``refs/tags/`` corresponds to a tag, and
160+
a ref beginning with ``refs/remotes/origin/`` after cloning corresponds
161+
to a branch.
162+
163+
Mercurial
164+
---------
165+
166+
Home page
167+
168+
https://www.mercurial-scm.org/
169+
170+
vcs command
171+
172+
hg
173+
174+
``vcs`` field
175+
176+
hg
177+
178+
``requested_revision`` field
179+
180+
A tag name, branch name, changeset ID, shortened changeset ID.
181+
182+
``commit_id`` field
183+
184+
A changeset ID (40 hexadecimal characters).
185+
186+
Bazaar
187+
------
188+
189+
Home page
190+
191+
https://bazaar.canonical.com
192+
193+
vcs command
194+
195+
bzr
196+
197+
``vcs`` field
198+
199+
bzr
200+
201+
``requested_revision`` field
202+
203+
A tag name, branch name, revision id.
204+
205+
``commit_id`` field
206+
207+
A revision id.
208+
209+
Subversion
210+
----------
211+
212+
Home page
213+
214+
https://subversion.apache.org/
215+
216+
vcs command
217+
218+
svn
219+
220+
``vcs`` field
221+
222+
svn
223+
224+
``requested_revision`` field
225+
226+
``requested_revision`` must be compatible with ``svn checkout`` ``--revision`` option.
227+
In Subversion, branch or tag is part of ``url``.
228+
229+
``commit_id`` field
230+
231+
Since Subversion does not support globally unique identifiers,
232+
this field is the Subversion revision number in the corresponding
233+
repository.
234+
235+
Examples
236+
========
237+
238+
Source archive:
239+
240+
.. code::
241+
242+
{
243+
"url": "https://github.com/pypa/pip/archive/1.3.1.zip",
244+
"archive_info": {
245+
"hashes": {
246+
"sha256": "2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8"
247+
}
248+
}
249+
}
250+
251+
Git URL with tag and commit-hash:
252+
253+
.. code::
254+
255+
{
256+
"url": "https://github.com/pypa/pip.git",
257+
"vcs_info": {
258+
"vcs": "git",
259+
"requested_revision": "1.3.1",
260+
"commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d"
261+
}
262+
}
263+
264+
Local directory:
265+
266+
.. code::
267+
268+
{
269+
"url": "file:///home/user/project",
270+
"dir_info": {}
271+
}
272+
273+
Local directory in editable mode:
274+
275+
.. code::
276+
277+
{
278+
"url": "file:///home/user/project",
279+
"dir_info": {
280+
"editable": true
281+
}
282+
}
283+
284+
History
285+
=======
286+
287+
- March 2020: this data structure was originally specified as part of the
288+
``direct_url.json`` metadata file in :pep:`610` and is formally documented here.
289+
- January 2023: Added the ``archive_info.hashes`` key
290+
([discussion](https://discuss.python.org/t/22299)).

0 commit comments

Comments
 (0)