Skip to content

Reduce DEME Jitify startup overhead#66

Open
yvrob wants to merge 1 commit into
projectchrono:mainfrom
yvrob:optimize-jitify-startup-cache
Open

Reduce DEME Jitify startup overhead#66
yvrob wants to merge 1 commit into
projectchrono:mainfrom
yvrob:optimize-jitify-startup-cache

Conversation

@yvrob

@yvrob yvrob commented Jun 10, 2026

Copy link
Copy Markdown

Summary

This PR reduces DEME startup/Jitify overhead in two places:

  • Skip sphere-triangle and triangle-bin Jitify programs when a simulation has no triangle meshes.
  • Add an opt-in persistent Jitify header source cache for workflows that launch many fresh DEME processes.

Default behavior stays conservative: upstream Jitify header loading is used unless DEME_PERSISTENT_JITIFY_CACHE is set.

Note about the large jitify.hpp diff

The large file in this PR is intentional, but it deserves a warning. Most of that diff is the full NVIDIA Jitify header copied into DEM-Engine as src/jitify/jitify.hpp; the actual change inside it is much smaller and adds an optional persistent source-header cache.

I did this because DEM-Engine currently gets Jitify through a submodule. Patching only the submodule would require either a separate Jitify fork/PR and a submodule pointer update, or a dependency on a local external patch that is harder to review and reproduce. Keeping the header copy in this PR makes the performance fix self-contained and lets DEM-Engine keep the old behavior by default.

That said, if maintainers would prefer a different route, such as carrying this in a Jitify fork, upstreaming it to NVIDIA/Jitify first, applying a smaller local patch during the build, or hiding it behind a different CMake layout, feedback is very welcome. The important part for our use case is to avoid paying the repeated Jitify/NVRTC header-discovery cost every time a fresh DEME process starts.

Persistent cache usage

Automatic per-user cache path:

export DEME_PERSISTENT_JITIFY_CACHE=1

On Linux/WSL this uses:

/tmp/deme_jitify_header_cache_$USER.bin

Explicit cache path:

export DEME_PERSISTENT_JITIFY_CACHE="$HOME/.cache/deme/jitify_header_cache.bin"
mkdir -p "$(dirname "$DEME_PERSISTENT_JITIFY_CACHE")"

Disable / default behavior:

unset DEME_PERSISTENT_JITIFY_CACHE
export DEME_PERSISTENT_JITIFY_CACHE=0

The cache follows the CUDA/toolchain setup, not simulation inputs. Changing material properties, geometry, particle counts, or timesteps does not invalidate it. If CUDA versions, include paths, or compiler options change, DEME ignores the old cache file and fills it again during that run.

Motivation and observed timings

In profiling on a V100S host, a cold no-mesh DEME micro-startup spent most of its time on host-side Jitify/NVRTC header discovery rather than GPU work.

Observed timings from the profiling run:

  • Original no-mesh Python micro-startup: Initialize() about 111.8 s.
  • After skipping mesh-only Jitify programs for no-mesh simulations: about 94.8 s.
  • With persistent Jitify header cache configured and primed: no-mesh Initialize() about 3.1 s.
  • A full mesh geometry with 500,000 spheres initialized in 4.18 s.

Validation

  • git diff --check
  • CMake configure in /tmp/deme_pr_cmake_check
  • Built CMake target core
  • Built CMake target DEM

GPU smoke validation was also run in the downstream pyDEME environment:

  • No-mesh DEME micro-startup: success, Initialize() about 3.26 s.
  • Mesh startup with N=100: success, Initialize() about 4.2 s.
  • Mesh startup with N=500000: success, Initialize() about 4.18 s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant