Skip to content

Docs: add obstore tutorial#527

Open
aboydnw wants to merge 9 commits into
microsoft:developfrom
aboydnw:docs-add-obstore-tutorial
Open

Docs: add obstore tutorial#527
aboydnw wants to merge 9 commits into
microsoft:developfrom
aboydnw:docs-add-obstore-tutorial

Conversation

@aboydnw
Copy link
Copy Markdown
Contributor

@aboydnw aboydnw commented May 22, 2026

aboydnw and others added 5 commits May 20, 2026 19:41
Adds a new tutorial walking through reading Planetary Computer data
with obstore (auto-refreshing SAS tokens, range reads, async, library
composability). Companion notebook lives in PlanetaryComputerExamples
at quickstarts/obstore.ipynb and is wired in via external_docs_config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the Colab badge (off-brand for PC; Hub is the canonical
JupyterLab environment) and replaces the TODO placeholders with
real URLs: nbgitpuller deep link to PC Hub and a github.com blob
link to the companion notebook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inlines the Hub and GitHub URLs on the badge line and drops the
reference-style defs at the bottom. Also picks up the inline copy
edits across the body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hub link is the canonical way to open the notebook; the GitHub
view duplicates what the docs site already renders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops Lonboard reference (no obstore integration in Lonboard) and
notes that zarr-python access goes through the zarr.storage.ObjectStore
adapter rather than direct hand-off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aboydnw aboydnw marked this pull request as ready for review May 26, 2026 15:19
Comment thread docs/overview/obstore.md Outdated
@@ -0,0 +1,164 @@
# Reading Planetary Computer data with obstore

[obstore](https://developmentseed.org/obstore/) is a Python library for reading and writing cloud object stores (Azure Blob, Amazon S3, Google Cloud Storage) directly through their native APIs. Using obstore, SAS tokens refresh automatically, async I/O is built in, and the same store you build for reading bytes can be handed to higher-level libraries like [async-geotiff](https://github.com/developmentseed/async-geotiff), [Lonboard](https://developmentseed.org/lonboard/), and [zarr-python](https://zarr.dev/) without re-authenticating.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly through their native APIs

I think this is a bit misleading. I think most users would understand "native" to mean "the raw underlying API specific to each cloud storage provider". That's not what obstore does; if a user wants to use the Azure API directly, they'll use azure.storage directly.

Obstore presents one, unified, abstracted API that is the same across Azure, S3, and GCS. That's the selling point.

Comment thread docs/overview/obstore.md Outdated
Comment thread docs/overview/obstore.md Outdated
Comment thread docs/overview/obstore.md
Comment on lines +23 to +38
```python
import pystac_client
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider

catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1"
)
item = next(catalog.search(collections=["naip"], max_items=1).items())
asset = item.assets["image"]
```

2. Build a credential provider from the asset.

```python
provider = PlanetaryComputerCredentialProvider.from_asset(asset)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize that from_asset is a bit annoying if you want to work with a collection instead of an item.

I see that the NAIP Collection JSON defines

"msft:storage_account": "naipeuwest"

so we could potentially have a from_collection constructor too.

Or maybe from_asset should really be renamed to from_stac, and support both Item and Collection? Thoughts?

Comment thread docs/overview/obstore.md
provider = PlanetaryComputerCredentialProvider.from_asset(asset)
```

3. Build a store using that provider. The store is your reusable connection to that asset.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to note that the store doesn't just connect to one asset; it provides the auth to access anything in that bucket (or I guess "container" in Azure terminology) (except as mentioned below, the prefix on the store is currently mounted to this specific file)

Comment thread docs/overview/obstore.md
2. **Read multiple byte ranges in a single request.** Cuts round-trip latency when you need several non-contiguous slices of the same file (e.g. multiple COG tiles).

```python
ranges = obstore.get_ranges(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto use store.get_ranges

Comment thread docs/overview/obstore.md Outdated
Comment thread docs/overview/obstore.md
Comment on lines +87 to +90
async def fetch(start, end):
return await obstore.get_range_async(async_store, "", start=start, end=end)

results = await asyncio.gather(*[fetch(i * 4096, (i + 1) * 4096) for i in range(8)])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bad example, because it's making several independent requests for different parts of a file.

For this use case we should be pointing users towards store.get_ranges_async, because under the hood that will combine adjacent ranges into a single network request.

So for example, this example makes independent requests for 0-4096, 4096-8192, etc. But get_ranges_async would automatically make just a single request under the hood for 0-32768, instead of 8 concurrent requests, and that would be a lot faster.

Comment thread docs/overview/obstore.md Outdated
Comment thread docs/overview/obstore.md
from obstore.store import S3Store

s3_store = S3Store(bucket="my-bucket", region="us-west-2")
buf = obstore.get(s3_store, "path/to/object").bytes()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this doesn't work... obstore.get won't work against the obspec protocol... The obspec protocol is defined in terms of the methods on the class. That's part of why I want to nudge people to use store.get instead of obstore.get

aboydnw and others added 4 commits May 26, 2026 15:01
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants