Skip to content

KZT, refactor: split loader object tracking and patch planning#3

Draft
LaurenIsACoder wants to merge 23 commits into
masterfrom
kzt-refactor
Draft

KZT, refactor: split loader object tracking and patch planning#3
LaurenIsACoder wants to merge 23 commits into
masterfrom
kzt-refactor

Conversation

@LaurenIsACoder

@LaurenIsACoder LaurenIsACoder commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

This draft PR carries the staged KZT loader refactor through stages 1 to 8.
The series documents the overall design, introduces guest object identity,
adds in-memory Dynamic Table parsing, records GOT patch decisions, prefers
guest-owner wrapper targets where possible, stops side-loading ordinary guest
dependencies, isolates lazy binding policy, makes guest dl APIs authoritative,
and narrows the private glibc hook into a fallback loader-event source.

The current local head has passed a debug build and the focused KZT loader
regression script. A full dEQP-EGL.* A/B run from the earlier convergence
point matched the pre-refactor baseline over 4111 EGL CTS cases; the latest
head should still go through a final full CTS A/B comparison before review is
considered complete.

中文详细说明

中文详细说明

设计意图

这组补丁的目标不是一次性重写整个 loader,而是把原来耦合在一起的 KZT
loader 路径拆成可验证、可替换的边界。

旧方案把 guest loader 通知、ELF 文件解析、maplib 重解析、wrapper 查找、
GOT 修改、lazy binding、dlopen/dlsym/dlclose 状态,以及 glibc 私有
hook 同步流程混在一起。核心问题是:guest loader 已经知道对象和绑定结果,
但 KZT 仍经常通过文件、全局符号范围或 maplib 重新推导一次,导致对象身份、
目标来源和引用生命周期都不清晰。

本次重构按阶段解决这些问题:

  • GuestObjectRegistry 记录 guest loader 观察到的对象身份。
  • 内存 Dynamic Parser 从 link_map/l_ld 解析运行时 ELF 元数据。
  • Patch Planner 把每一次 GOT 写入变成可记录的决策。
  • guest-owner lookup 优先使用 GOT 当前值所属 guest object 选择 wrapper。
  • 普通 guest 依赖不再由 KZT 旁路加载。
  • lazy binding 决策从通用 relocation 中隔离。
  • guest dlopen/dlsym/dlclose 等 API 返回结果成为权威。
  • glibc 私有 hook 被收窄成 fallback loader-event source。

当前补丁组织

当前系列按 review 粒度整理为 19 个提交:

  • Patch 1: 设计文档,说明整体重构计划。
  • Patch 2: Stage 1,引入 GuestObjectRegistry。
  • Patch 3: Stage 2,引入内存 Dynamic Parser。
  • Patch 4: Stage 3,引入 Patch Planner。
  • Patch 5: Stage 4,比较 guest-owner patch target。
  • Patch 6: Stage 4,优先使用成功的 guest-owner target。
  • Patch 7: Stage 5,停止普通 guest dependency side loading。
  • Patch 8: Stage 6,隔离 lazy binding 决策。
  • Patch 9-13: Stage 7,重构 guest dl API 边界。
  • Patch 14-18: Stage 8,建立 loader event source 边界并 harden fallback。
  • Patch 19: Stage 7/8 收敛后的 dl API 健壮性修复。

是否符合设计预期

总体符合预期。

这组补丁已经把 KZT 从“自己重做 guest loader 决策”推进到“观察 guest loader
结果,并在明确边界后做 wrapper 替换”的方向。对象身份、Dynamic Table 解析、
GOT patch 决策、guest-owner wrapper 选择、dependency loading、lazy binding、
dl API 语义和 loader event source 都已经拆出明确边界。

Stage 8 仍是过渡形态。glibc 私有 hook 还没有完全移除,但它已经不再承担整个
装载同步流程,只作为 fallback event source 上报 link_map 事件。后续可以在
这个边界后替换成 r_debug、mprotect、QEMU loader/mmap event,或更小的
版本隔离 fallback hook。

已解决的原有缺陷

已解决主要结构缺陷:

  • 原来 guest 对象身份不清晰;现在由 GuestObjectRegistry 明确记录。
  • 原来依赖文件解析和 Section Header;现在已有基于 l_ld 的内存 parser。
  • 原来 GOT 修改不可观察;现在 Patch Planner 记录 object、relocation、
    symbol、version、old target、old owner、new bridge 和 reason。
  • 原来 wrapper 选择大量依赖 maplib/global symbol 重解析;现在优先使用
    GOT 当前值所属 guest object。
  • 原来普通 guest DT_NEEDED 会被 KZT 旁路加载;现在交回 guest loader。
  • 原来 lazy binding 修改分散在通用 relocation 中;现在集中在独立 helper。
  • 原来 dlopen/dlsym/dlclose 有合成句柄、重复引用计数和旁路 reload;
    现在 guest loader 返回结果成为权威。
  • 原来 glibc hook 直接驱动同步流程;现在只负责发出 loader event。

最近收敛的修复

最新本地提交补充了 Stage 7/8 收敛后的健壮性修复:

  • RTLD_DEFAULT 先询问 guest dlsym,只有 guest miss 后才走兼容 fallback。
  • wrapper handle 的 dlsym/dlvsym 先由 guest link_map 决定符号是否存在,
    成功后才允许 local native bridge 替换返回值。
  • dlvsym 保持版本化 guest lookup 路径,缺失 metadata 时不再退回普通
    非版本 dlsym
  • 缺失 local metadata、dladdr1(RTLD_DL_LINKMAP) fallback、关闭后的 handle
    等路径返回普通 dl error,而不是 assert 或写入无效结果。
  • wrapper handle 的显式 dlopen 会持有 guest loader 引用,使后续 dlclose
    与 guest 侧引用计数配对。
  • dlprivate handle 表扩容时初始化所有并行数组,避免未使用槽位残留状态。
  • Stage 8 fallback hook 安装和事件读取路径增加防御:安装失败会关闭 KZT,
    异常 env/hook/reg 不再读取坏寄存器。

新风险点

本次修改也引入或暴露了需要继续关注的风险:

  • Stage 7 改变 dl API 的权威来源,RTLD_NEXTRTLD_DEFAULT
    RTLD_NOLOADdlmopen namespace、versioned symbol、dladdr1 等边角
    语义仍需要更多真实应用覆盖。
  • local wrapper metadata 仍需和 guest-owned handle 生命周期保持一致;这比旧
    方案更清晰,但仍是高风险区域。
  • Stage 8 仍保留 glibc 私有 hook 作为 fallback,虽然已经增加 bounds check、
    pattern validation 和 missing-source fallback,但不同 glibc 版本仍可能需要
    额外适配。
  • 当前完整 A/B 覆盖过 dEQP-EGL.*,但最新 HEAD 还需要最终整体 CTS 对比;
    GLX、Vulkan、Wine 和真实应用路径仍需要后续覆盖。

已完成验证

最新本地 HEAD 已完成:

env CCACHE_DIR=/tmp/latx-ccache ninja -C build64-dbg
sh tests/latx-x86_64/run-kzt-loader-regressions.sh build64-dbg/latx-x86_64

此前收敛过程中的完整 EGL CTS A/B 对比:

tests/latx-x86_64/run-kzt-cts-compare.sh \
  --baseline /tmp/lat-kzt-baseline-artifact/latx-x86_64.before-kzt-refactor-local \
  --current /home/loongson/work/code/lat-opensource/lat/build64-dbg/latx-x86_64 \
  --cts-dir /home/loongson/data_2T/x86_test/kzt/xzy/VK-GL-CTS/builds-x86-egl/external/openglcts/modules \
  --timeout 7200 -- --deqp-visibility=hidden --deqp-watchdog=enable -n 'dEQP-EGL.*'

结果:

baseline/current per-case results matched
Passed:        2128/4111 (51.8%)
Failed:        32/4111 (0.8%)
Not supported: 1949/4111 (47.4%)
Warnings:      2/4111 (0.0%)
Waived:        0/4111 (0.0%)

后续工作

后续工作主要有三类:

  • 对最新 HEAD 执行最终完整 CTS A/B 对比。
  • 扩展验证到 GLX、Vulkan、Wine 和真实应用路径。
  • 在 Stage 8 边界后实验 r_debug 通知、RELRO mprotect、QEMU mmap/loader
    event 等方案,逐步减少 glibc 私有 hook 依赖。
English Detailed Description

English Detailed Description

Design Intent

This series does not try to rewrite the whole loader in one step. It splits
the existing KZT loader path into explicit, testable, and replaceable
boundaries.

The old path mixed guest loader notification, ELF file parsing, maplib
re-resolution, wrapper lookup, GOT patching, lazy binding, dl API bookkeeping,
and the private glibc hook synchronization path. The central problem was that
the guest loader already knew the object and binding result, while KZT often
derived the target again through files, global symbol ranges, or maplib. That
made object identity, target provenance, and handle lifetime difficult to
reason about.

The staged refactor addresses this by:

  • recording guest object identity in GuestObjectRegistry;
  • parsing runtime ELF metadata from link_map/l_ld;
  • recording GOT writes as Patch Planner decisions;
  • preferring wrapper targets from the guest object that owns the current GOT
    value;
  • returning ordinary guest dependency loading to the guest loader;
  • isolating lazy binding policy from generic relocation;
  • making guest dlopen/dlsym/dlclose results authoritative;
  • narrowing the private glibc hook into a fallback loader-event source.

Current Patch Organization

The current series is organized into 19 reviewable patches:

  • Patch 1 documents the overall refactor plan.
  • Patch 2 covers stage 1 and introduces GuestObjectRegistry.
  • Patch 3 covers stage 2 and adds the in-memory Dynamic Parser.
  • Patch 4 covers stage 3 and introduces the Patch Planner.
  • Patch 5 covers stage 4 shadow comparison for guest-owner patch targets.
  • Patch 6 completes stage 4 by preferring successful guest-owner targets.
  • Patch 7 covers stage 5 and stops ordinary guest dependency side loading.
  • Patch 8 covers stage 6 and isolates lazy binding decisions.
  • Patches 9-13 cover stage 7 and refactor the guest dl API boundary.
  • Patches 14-18 cover stage 8 and establish the loader event source boundary.
  • Patch 19 contains the final stage 7/8 dl API hardening fixes.

Design Conformance

The current implementation matches the refactor direction.

The code now has explicit boundaries for guest object identity, runtime dynamic
metadata parsing, GOT patch decisions, guest-owner wrapper lookup, dependency
loading, lazy binding, guest dl API authority, and loader event sources. This
moves KZT away from redoing guest loader decisions and toward observing guest
loader results before applying wrapper replacement.

Stage 8 is still transitional. The private glibc hook has not been fully
removed, but it has been narrowed to a fallback event source. It reports
link_map events and no longer owns the whole synchronization flow. That
boundary is the point where future r_debug, mprotect, QEMU loader/mmap event,
or smaller version-isolated fallback sources can be plugged in.

Defects Addressed

The series addresses the main structural defects of the old scheme:

  • Guest object identity is now explicit in GuestObjectRegistry.
  • Runtime Dynamic Table parsing no longer depends only on files or Section
    Headers.
  • GOT writes are observable through Patch Planner decisions.
  • Wrapper lookup can prefer the guest object that owns the current GOT target
    instead of redoing global maplib resolution first.
  • Ordinary guest DT_NEEDED dependencies are no longer side-loaded by KZT.
  • Lazy binding policy is isolated from generic relocation code.
  • Guest dl API results are authoritative, removing synthetic handles,
    duplicated reference counting, and side reload logic.
  • The private glibc hook is reduced to an event source instead of driving the
    whole loader sync path.

Recent Hardening

The latest local commit adds final stage 7/8 hardening:

  • RTLD_DEFAULT asks guest dlsym first and uses compatibility fallback only
    after a guest miss.
  • Wrapper-handle dlsym/dlvsym ask the guest link_map to decide symbol
    existence before replacing a successful result with a local native bridge.
  • dlvsym stays on the versioned guest lookup path and no longer falls back to
    a plain non-versioned dlsym when local metadata is missing.
  • Missing local metadata, dladdr1(RTLD_DL_LINKMAP) fallback, and closed-handle
    paths report ordinary dl errors instead of asserting or writing invalid data.
  • Explicit wrapper-handle dlopen retains the guest loader reference so later
    dlclose calls are paired on the guest side.
  • dlprivate handle-table growth initializes all parallel arrays.
  • Stage 8 fallback hook installation and event capture are defensive: install
    failure disables KZT, and invalid env/hook/reg state no longer reads a bad
    guest register.

New Risks

The refactor also introduces or exposes risks that still need attention:

  • Stage 7 changes the authoritative source for dl API behavior. Edge cases such
    as RTLD_NEXT, RTLD_DEFAULT, RTLD_NOLOAD, dlmopen namespaces,
    versioned symbols, and dladdr1 still need more real-application coverage.
  • Local wrapper metadata must stay consistent with guest-owned handle lifetime.
    This is clearer than the old model but remains a sensitive area.
  • Stage 8 still uses a private glibc hook as a fallback source. Bounds checks,
    pattern validation, and missing-source fallback reduce the risk, but new glibc
    layouts may still require additional adaptation.
  • A full dEQP-EGL.* A/B run has matched during convergence, but the latest
    head still needs a final full CTS comparison. GLX, Vulkan, Wine, and real
    application paths also need follow-up coverage.

Validation Completed

Latest local head:

env CCACHE_DIR=/tmp/latx-ccache ninja -C build64-dbg
sh tests/latx-x86_64/run-kzt-loader-regressions.sh build64-dbg/latx-x86_64

Earlier full EGL CTS A/B comparison during convergence:

tests/latx-x86_64/run-kzt-cts-compare.sh \
  --baseline /tmp/lat-kzt-baseline-artifact/latx-x86_64.before-kzt-refactor-local \
  --current /home/loongson/work/code/lat-opensource/lat/build64-dbg/latx-x86_64 \
  --cts-dir /home/loongson/data_2T/x86_test/kzt/xzy/VK-GL-CTS/builds-x86-egl/external/openglcts/modules \
  --timeout 7200 -- --deqp-visibility=hidden --deqp-watchdog=enable -n 'dEQP-EGL.*'

Result:

baseline/current per-case results matched
Passed:        2128/4111 (51.8%)
Failed:        32/4111 (0.8%)
Not supported: 1949/4111 (47.4%)
Warnings:      2/4111 (0.0%)
Waived:        0/4111 (0.0%)

Follow-up Work

Follow-up work is focused on three areas:

  • Run the final full CTS A/B comparison for the latest head.
  • Extend validation to GLX, Vulkan, Wine, and real application paths.
  • Replace the stage 8 event source with r_debug, RELRO mprotect, QEMU
    mmap/loader events, or a smaller version-isolated fallback hook.

Signed-off-by: Sun Haiyong <sunhaiyong@zdbr.net>
@LaurenIsACoder LaurenIsACoder force-pushed the kzt-refactor branch 2 times, most recently from 99f8a9a to 81ab7db Compare June 16, 2026 06:37
LaurenIsACoder and others added 8 commits June 16, 2026 15:53
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
Signed-off-by: zqz <2264460073@qq.com>
Signed-off-by: Hanlu Li <heuleehanlu@gmail.com>
@LaurenIsACoder LaurenIsACoder force-pushed the kzt-refactor branch 2 times, most recently from a24da1a to bb8e12e Compare June 17, 2026 01:21
Document the KZT loader refactor as a staged design rather than a
progress report.

The plan explains why loader synchronization needs clearer object
identity, memory-based metadata parsing, recorded patch decisions, and a
stable event boundary before the private glibc hook can be reduced.

The series starts with registry and parser boundaries, then moves GOT
patching, dependency loading, lazy binding, dlopen/dlsym/dlclose, and
loader notifications onto guest-owned state.
Stage 1 creates a registry for guest ELF object identity.

The loader callback records object name, load range, base address, and
dynamic table address before forwarding to the legacy path.  Later
stages can use this as the address-to-object boundary without changing
current glibc hook behavior.
Stage 2 adds a Dynamic Table parser that reads metadata from guest
memory.

The parser materializes strings, symbols, relocations, and version data
from runtime dynamic information, with range validation before derived
tables are walked.

The legacy file parser remains active for comparison, so differences are
visible before the section-header dependency is removed.
Stage 3 turns GOT updates into explicit patch decisions.

Each decision records the guest object, relocation, symbol/version, old
target, old owner, selected bridge, target source, and reason before a
slot is patched.

The selected target is unchanged in this patch; the value is the
observable boundary needed for later resolver changes.
Stage 4 starts observing whether the current GOT value identifies the
guest object that owns the resolved target.

The shadow path classifies guest-owner lookup results and compares them
with the existing maplib-selected bridge, without changing the selected
target.

This makes guest-owner resolution reviewable before it can replace the
global maplib lookup.
Complete Stage 4 by allowing successful guest-owner lookup to select the
native wrapper bridge directly.

When the current GOT target belongs to a known guest object and resolves
to a wrapper, the planner records a guest-owner decision and skips the
maplib reparse for that slot.

Failed probes keep the existing maplib fallback.  Lazy binding is left
to the next stage.
Stage 5 lets the guest loader own ordinary guest dependencies.

KZT now registers wrappers only for objects already reported by the
guest loader.  It stops expanding guest RPATH/RUNPATH, recursively
loading guest DT_NEEDED entries, and side-loading guest libraries
through LoadNeededLibs().

AddNeededLib() remains for native wrapper registration and wrapper host
dependencies.
Stage 6 moves lazy JUMP_SLOT policy out of the generic relocation path.

Lazy slot classification, resolver metadata, deferred patch decisions,
and first-call resolver actions now live behind guestlazy helpers.

Behavior is intended to stay unchanged.  The new boundary prepares the
later model where guest ld.so binds first and KZT replaces the slot
after binding.
Prepare Stage 7 by splitting common guest dl helper calls out of the
wrapper entry points.

The patch names the handle lookup, RunFunctionWithState() forwarding,
and guest dlclose forwarding paths without changing behavior.

The following patches use this boundary to make guest loader results
authoritative.
Stage 7 starts making dlopen handles follow guest loader ownership.

Wrapper state is keyed by handles returned by the guest loader,
including reused handles, reopen-after-close, self handles, and
closed-handle deactivation.

The compatibility path no longer reloads guest objects behind the guest
loader, and fixed-buffer inputs are bounded before copying.
Stage 7 moves dlsym and dlvsym authority to guest-owned lookup results.

RTLD_DEFAULT, RTLD_NEXT, wrapped link_map lookups, versioned requests,
and error reporting are routed through named helpers before local
wrapper metadata is used as a compatibility fallback.

Local symbols only replace successful guest results with native bridge
addresses.
Complete Stage 7 by making close and dl information queries follow guest
loader state.

dlclose releases guest loader references, while dladdr, dladdr1, dlinfo,
and dlvsym prefer guest link_map and lookup results when available.

Local metadata remains a compatibility fallback and failure paths avoid
stale handle-table state or duplicate close accounting.
Start Stage 8 by routing loader notifications through a registry event
dispatcher.

The glibc callback now builds an immutable link_map event snapshot and
submits it to shared object registration and legacy compatibility logic.

This separates event capture from object synchronization, so future
event sources can reuse the same registry path.
Stage 8 makes r_debug the normal loader notification path and keeps the
private glibc hook as a version-gated fallback source.

Both sources feed the same registry event dispatcher.  The r_debug path
validates loader state and link_map input before registration, while
fallback failures stay local to the fallback source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants