Skip to content

feat: add lazy load eager eviction#1574

Draft
shikaku2 wants to merge 1 commit into
leejet:masterfrom
shikaku2:feat/lazy-load-eager-evict
Draft

feat: add lazy load eager eviction#1574
shikaku2 wants to merge 1 commit into
leejet:masterfrom
shikaku2:feat/lazy-load-eager-evict

Conversation

@shikaku2
Copy link
Copy Markdown

@shikaku2 shikaku2 commented May 28, 2026

Split out from draft PR #1573: #1573

Summary

Adds a -ll / --lazy-load flag for lazy load/eager evict model execution. The goal is to make larger image models practical on limited VRAM systems by loading model tensors through mmap and dropping already-used component pages after each pipeline stage.

This PR intentionally contains only the runtime lazy-load/eager-evict path. The conversion/RMSE/AIO work from #1573 is not included here.

How it works

  • Enables mmap-backed model loading when --lazy-load is set.
  • Forces free_params_immediately, so component backend buffers are released after use.
  • Adds a platform-layer evict_memory() helper in util.cpp.
  • Calls eager eviction after text encoder, diffusion, and VAE stages.
  • Gates MADV_DONTNEED behind SD_MMAP_FLAGS=dontneed; without that flag, the helper is a no-op.
  • Applies eviction to normal mmap-to-VRAM code paths as well, not just the new flag path.
  • Auto-enables VAE tiling with --lazy-load to avoid very large single VAE decode allocations. On my Vulkan setup, SD3.5 VAE decode at 1024x1024 would otherwise hit a large VkBuffer allocation that exceeds common Vulkan maxMemoryAllocationSize limits.

Testing

  • Built successfully with cmake --build build -j16.
  • git diff --check passes.
  • Runtime testing so far is on Vulkan only, on my machine (Arch Linux). That is the main reason this is a draft: it needs testing, and possibly follow-up code, on other GPU backends and operating systems.

Notes

This addresses the lazy-loading-specific review feedback from #1573 by keeping platform-dependent eviction code in util.cpp, making MADV_DONTNEED opt-in through SD_MMAP_FLAGS=dontneed, and calling eviction for regular mmap-backed load paths too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant