Reduce DEME Jitify startup overhead by yvrob · Pull Request #66 · projectchrono/DEM-Engine

yvrob · 2026-06-10T20:22:19Z

Summary

This PR reduces DEME startup/Jitify overhead in two places:

Skip sphere-triangle and triangle-bin Jitify programs when a simulation has no triangle meshes.
Add an opt-in persistent Jitify header source cache for workflows that launch many fresh DEME processes.

Default behavior stays conservative: upstream Jitify header loading is used unless DEME_PERSISTENT_JITIFY_CACHE is set.

Note about the large `jitify.hpp` diff

The large file in this PR is intentional, but it deserves a warning. Most of that diff is the full NVIDIA Jitify header copied into DEM-Engine as src/jitify/jitify.hpp; the actual change inside it is much smaller and adds an optional persistent source-header cache.

I did this because DEM-Engine currently gets Jitify through a submodule. Patching only the submodule would require either a separate Jitify fork/PR and a submodule pointer update, or a dependency on a local external patch that is harder to review and reproduce. Keeping the header copy in this PR makes the performance fix self-contained and lets DEM-Engine keep the old behavior by default.

That said, if maintainers would prefer a different route, such as carrying this in a Jitify fork, upstreaming it to NVIDIA/Jitify first, applying a smaller local patch during the build, or hiding it behind a different CMake layout, feedback is very welcome. The important part for our use case is to avoid paying the repeated Jitify/NVRTC header-discovery cost every time a fresh DEME process starts.

Persistent cache usage

Automatic per-user cache path:

export DEME_PERSISTENT_JITIFY_CACHE=1

On Linux/WSL this uses:

/tmp/deme_jitify_header_cache_$USER.bin

Explicit cache path:

export DEME_PERSISTENT_JITIFY_CACHE="$HOME/.cache/deme/jitify_header_cache.bin"
mkdir -p "$(dirname "$DEME_PERSISTENT_JITIFY_CACHE")"

Disable / default behavior:

unset DEME_PERSISTENT_JITIFY_CACHE
export DEME_PERSISTENT_JITIFY_CACHE=0

The cache follows the CUDA/toolchain setup, not simulation inputs. Changing material properties, geometry, particle counts, or timesteps does not invalidate it. If CUDA versions, include paths, or compiler options change, DEME ignores the old cache file and fills it again during that run.

Motivation and observed timings

In profiling on a V100S host, a cold no-mesh DEME micro-startup spent most of its time on host-side Jitify/NVRTC header discovery rather than GPU work.

Observed timings from the profiling run:

Original no-mesh Python micro-startup: Initialize() about 111.8 s.
After skipping mesh-only Jitify programs for no-mesh simulations: about 94.8 s.
With persistent Jitify header cache configured and primed: no-mesh Initialize() about 3.1 s.
A full mesh geometry with 500,000 spheres initialized in 4.18 s.

Validation

git diff --check
CMake configure in /tmp/deme_pr_cmake_check
Built CMake target core
Built CMake target DEM

GPU smoke validation was also run in the downstream pyDEME environment:

No-mesh DEME micro-startup: success, Initialize() about 3.26 s.
Mesh startup with N=100: success, Initialize() about 4.2 s.
Mesh startup with N=500000: success, Initialize() about 4.18 s.

Reduce DEME Jitify startup overhead

b87f747

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce DEME Jitify startup overhead#66

Reduce DEME Jitify startup overhead#66
yvrob wants to merge 1 commit into
projectchrono:mainfrom
yvrob:optimize-jitify-startup-cache

yvrob commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yvrob commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Note about the large jitify.hpp diff

Persistent cache usage

Motivation and observed timings

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yvrob commented Jun 10, 2026 •

edited

Loading

Note about the large `jitify.hpp` diff