Skip to content

Replace pre-pull workaround with BuildKit session + filesync #50

@bilby91

Description

@bilby91

Context

#47 switched the docker runtime to BuildKit (Version: build.BuilderBuildKit) to escape the classic-builder authz-plugin pathology, but stopped short of the canonical BuildKit integration. The short-of-canonical version works (DAP is unblocked, builds are fast) but carries three real limitations that a session + filesync integration would close together.

The deferred work in #47 was scoped intentionally to avoid pulling github.com/moby/buildkit in as a direct dep (~100+ transitive packages: containerd, grpc, opentelemetry-extras). This issue tracks the followup.

Limitations to address

1. Per-vertex progress events are dropped

Under BuildKit, dockerd emits moby.buildkit.trace records of the form {"id":"moby.buildkit.trace","aux":"<base64-protobuf>"} carrying buildkit's SolveStatus. runtime/docker/build.go:streamBuildOutput ignores them (the previous stream / status JSON shape is classic-builder-only), so:

  • No BuildLayerEvent fires during the build
  • No BuildLogEvent fires for RUN output
  • Consumers see BuildStart → silence → BuildCompleted

The aux payloads are protobuf-encoded; decoding requires either pulling in github.com/moby/buildkit (which has the schema and a progressui renderer) or hand-rolling the proto definitions.

2. extractBaseImages can't resolve some Dockerfiles

runtime/docker/build.go:extractBaseImages does a naive line-based parse to find FROM image references for pre-pulling. It handles literal FROMs, ARG defaults, and BuildSpec.Args overrides — but silently drops anything whose ref still contains $ after substitution, including:

  • Runtime-supplied ARGs without defaults (the daemon resolves at build time, we don't see the value)
  • Dynamic refs computed by buildx/buildkit's parser features
  • Refs constructed from multiple ARGs we don't fully evaluate

When the parser drops a ref, the build falls through to BuildKit's "no active sessions" error rather than getting pre-pulled — same failure mode as before #47 for these cases. Rare in our generated Dockerfiles (useruid.go, feature/dockerfile.go produce simple, parseable FROMs) but a sharp edge for user-supplied dockerfile-source contexts.

3. Custom tarDirectory is still load-bearing

runtime/docker/build.go:tarDirectory is our hand-rolled context packer. #47 fixed the symlink-Linkname bug, but every other tar-encoding corner case is on us:

  • .dockerignore is not honored at all
  • UID/GID mapping for cross-user builds
  • Hardlinks (we follow them as regular files)
  • Sparse files
  • Very large contexts (the io.Pipe approach is bounded, but no streaming-progress signal)

Canonical fix

Open a buildkit session with:

  • session.Session (github.com/moby/buildkit/session)
  • auth.Provider registered on the session — even a stub returning empty creds satisfies BuildKit's "session required for remote resolution" check, eliminating the need for pre-pulling
  • filesync.FSSyncProvider rooted at spec.ContextPath, with RemoteContext: \"client-session\" and SessionID: sess.ID() in ImageBuildOptions — replaces our tarDirectory entirely, handles .dockerignore
  • aux-record decoder in streamBuildOutput — emits BuildLayerEvent per vertex transition and BuildLogEvent per vertex log line

Reference implementation: docker/cli's cli/command/image/build.go does exactly this. The auth provider there is cli/command/registry.NewStaticCredentialsStore wrapped in cli/command/image/build.NewSessionAuthProvider; for our usage we can use a stub with no creds since we already pass AuthConfigs separately when needed.

Acceptance

  • Per-step BuildLayerEvent and BuildLogEvent fire during a BuildKit build
  • extractBaseImages is deleted (no longer needed — BuildKit resolves with the session)
  • prePullBaseImages is deleted
  • tarDirectory is deleted (replaced by buildkit's filesync provider)
  • .dockerignore is honored on user-supplied build contexts
  • runtime/docker/build_test.go parser tests can be deleted; new tests cover session lifecycle (open, cancel-on-ctx, cleanup)

Cost

Adds github.com/moby/buildkit as a direct dep. Transitive deps include containerd, grpc, opentelemetry-contrib extras. Concrete size impact should be measured (go mod why -m all | wc -l before/after) before committing to the change.

Repo / file pointers

  • runtime/docker/build.go:35-90BuildImage (the change site)
  • runtime/docker/build.go:107-166streamBuildOutput (rewrite for aux records)
  • runtime/docker/build.go:168-232tarDirectory (delete)
  • runtime/docker/build.go:234-410prePullBaseImages, extractBaseImages, helpers (delete)
  • runtime/docker/build_test.go — parser tests (delete; add session-lifecycle tests)

Severity

Low — DAP is unblocked, builds are fast, observability is degraded-but-not-broken. Worth doing before we have a consumer that depends on per-vertex progress events or hits a Dockerfile our parser can't resolve.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions