As discussed on slack, I am sharing an MWE to reproduce the issues with some backends on the GPU that I hit. I believe that most of these failures are very representative of the failures that I have seen also when using the low-level API of each backend so I believe that, largely, these are not DI's fault. But as suggested, I am sharing these anyway:
MWE
using Flux: Flux
using CUDA: CUDA
using Chairmarks: @be
using DifferentiationInterface: DifferentiationInterface as DI
# various DI backends that support pullback
using Zygote: Zygote
using Enzyme: Enzyme
using Tracker: Tracker
using ReverseDiff: ReverseDiff
using Diffractor: Diffractor
using FiniteDiff: FiniteDiff
using Tapir: Tapir
# Without this, Enzyme errors out with "You may be using a constant variable as temporary storage for active memory"
Enzyme.API.runtimeActivity!(true)
function main(device = Flux.gpu, test_forward_pass = false)
# this just a toy model; in the real setting, the model has ~1M parameters, ~512 inputs and outputs, and is a fully-convolutional network
model = Flux.Chain(Flux.Dense(10, 32, Flux.relu), Flux.Dense(32, 10)) |> device
data = randn(Float32, 10, 100) |> device
labels = randn(Float32, 10, 100) |> device
σ = 1.0f0
dy1 = randn(size(data)) |> device
dy2 = randn(size(data)) |> device
original_size = size(data)
function f(x_t)
(model(x_t) - labels) .^ 2 / σ^2
end
# some of the backends require a flat input so this is a helper wrapper
function f_flat(x_t)
f(reshape(x_t, original_size))[:]
end
# testing forward pass to give a reference point
@info "ForwardPass..."
@time "✅ initial run" f(data)
println("detailed benchmark:")
@be(f(data)) |> display
# NOTE:
# only tested on the GPU
backends = [
DI.AutoZygote(), # ~0.05s
DI.AutoEnzyme(), # compiles for quite long, then errors out due to augmented forward pass custom rule type mismatch.
DI.AutoTracker(), # errors with `Tracker.TrackedReal{Float32} is a mutable type
DI.AutoReverseDiff(), # errors due because it generates code that tries to multiply CPU and GPU arrays
DI.AutoDiffractor(), # ERROR: MethodError: no method matching ndims(::Tuple{Int64, Int64})
DI.AutoFiniteDiff(), # ~18s
DI.AutoTapir(), # ERROR: CUDA.CuPtr{Nothing} is a primitive type. Implement a method of `tangent_type` for it.
]
# creating a lazy Jacobian-vector product operator
for backend in backends
try
@info "VJP with DI.jl and $backend"
local pullback_prep
@time "✅ initial run" let
@time "preparing pullback object..." pullback_prep =
DI.prepare_pullback_same_point(f_flat, backend, data[:], dy1[:])
@time "using it once" DI.pullback(f_flat, backend, data[:], dy1[:], pullback_prep)
end
println("detailed benchmark:")
@be(DI.pullback(f_flat, backend, data[:], dy2[:], pullback_prep)) |> display
catch e
@error "Error with $backend: $e"
Base.show_backtrace(stdout, backtrace())
end
end
nothing
end
Program Output
https://pastebin.com/9kf9qqhG
Version Info
This output is what I get from using the latest registered version of all backends in combination with JuliaGPU/CUDA.jl#2422 to facilitate Enzyme reverse diff on the GPU.
Click to see output of `]status -m `
Status `~/worktree/BugReports/DifferentiationInterface.jl-vjp-failures/Manifest.toml`
[47edcb42] ADTypes v1.6.1
[c29ec348] AbstractDifferentiation v0.6.2
[621f4979] AbstractFFTs v1.5.0
[1520ce14] AbstractTrees v0.4.5
[7d9f7c33] Accessors v0.1.36
[79e6a3ab] Adapt v4.0.4
[dce04be8] ArgCheck v2.3.0
[ec485272] ArnoldiMethod v0.4.0
[4fba245c] ArrayInterface v7.12.0
[a9b6321e] Atomix v0.1.0
[ab4f0b2a] BFloat16s v0.5.0
[198e06fe] BangBang v0.4.2
[9718e550] Baselet v0.1.1
[fa961155] CEnum v0.5.0
[052768ef] CUDA v5.4.2 `https://github.com/wsmoses/CUDA.jl#renz`
[1af6417a] CUDA_Runtime_Discovery v0.3.4
[082447d4] ChainRules v1.69.0
[d360d2e6] ChainRulesCore v1.24.0
[0ca39b1e] Chairmarks v1.2.1
[da1fd8a2] CodeTracking v1.3.5
[3da002f7] ColorTypes v0.11.5
[5ae59095] Colors v0.12.11
[861a8166] Combinatorics v1.0.2
[bbf7d656] CommonSubexpressions v0.3.0
[34da2185] Compat v4.15.0
[a33af91c] CompositionsBase v0.1.2
[187b0558] ConstructionBase v1.5.5
[6add18c4] ContextVariablesX v0.1.3
[a8cc5b0e] Crayons v4.1.1
[f68482b8] Cthulhu v2.12.7
[9a962f9c] DataAPI v1.16.0
[a93c6f00] DataFrames v1.6.1
[864edb3b] DataStructures v0.18.20
[e2d170a0] DataValueInterfaces v1.0.0
[244e2a9f] DefineSingletons v0.1.2
[8bb1440f] DelimitedFiles v1.9.1
[163ba53b] DiffResults v1.1.0
[b552c78f] DiffRules v1.15.1
[de460e47] DiffTests v0.1.2
[a0c0ee7d] DifferentiationInterface v0.5.9
⌃ [9f5e2b26] Diffractor v0.2.6
[ffbed154] DocStringExtensions v0.9.3
[7da242da] Enzyme v0.12.23
[f151be2c] EnzymeCore v0.7.7
[e2ba6199] ExprTools v0.1.10
[cc61a311] FLoops v0.2.2
[b9860ae5] FLoopsBase v0.1.1
[1a297f60] FillArrays v1.11.0
[6a86dc24] FiniteDiff v2.23.1
[53c48c17] FixedPointNumbers v0.8.5
[587475ba] Flux v0.14.16
[1eca21be] FoldingTrees v1.2.1
[f6369f11] ForwardDiff v0.10.36
[069b7b12] FunctionWrappers v1.1.3
[d9f16b24] Functors v0.4.11
[0c68f7d7] GPUArrays v10.3.0
[46192b85] GPUArraysCore v0.1.6
⌃ [61eb1bfa] GPUCompiler v0.26.5
[86223c79] Graphs v1.11.2
[7869d1d1] IRTools v0.4.14
[d25df0c9] Inflate v0.1.5
[22cec73e] InitialValues v0.3.1
[842dd82b] InlineStrings v1.4.2
[3587e190] InverseFunctions v0.1.15
[41ab1584] InvertedIndices v1.3.0
[92d709cd] IrrationalConstants v0.2.2
[82899510] IteratorInterfaceExtensions v1.0.0
[c3a54625] JET v0.9.6
[692b3bcd] JLLWrappers v1.5.0
[aa1ae85d] JuliaInterpreter v0.9.32
[70703baa] JuliaSyntax v0.4.8
[b14d175d] JuliaVariables v0.2.4
[63c18a36] KernelAbstractions v0.9.22
⌅ [929cbde3] LLVM v7.2.1
[8b046642] LLVMLoopInfo v1.0.0
[b964fa9f] LaTeXStrings v1.3.1
[2ab3a3ac] LogExpFunctions v0.3.28
[6f1432cf] LoweredCodeUtils v2.4.8
[d8e11817] MLStyle v0.4.17
[f1d291b0] MLUtils v0.4.4
[1914dd2f] MacroTools v0.5.13
[128add7d] MicroCollections v0.2.0
[e1d29d7a] Missings v1.2.0
[dbe65cb8] MistyClosures v1.0.1
[872c559c] NNlib v0.9.20
[5da4648a] NVTX v0.3.4
[77ba4419] NaNMath v1.0.2
[71a1bf82] NameResolution v0.1.5
[d8793406] ObjectFile v0.4.1
[6fe1bfb0] OffsetArrays v1.14.1
[0b1bfda6] OneHotArrays v0.2.5
[3bd65402] Optimisers v0.3.3
[bac558e1] OrderedCollections v1.6.3
[65ce6f38] PackageExtensionCompat v1.0.2
[2dfb63ee] PooledArrays v1.4.3
[aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[8162dcfd] PrettyPrint v0.2.0
[08abe8d2] PrettyTables v2.3.2
[33c8b6b6] ProgressLogging v0.1.4
[74087812] Random123 v1.7.0
[e6cf234a] RandomNumbers v1.5.3
[c1ae055f] RealDot v0.1.0
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.3.0
[37e2e3b7] ReverseDiff v1.15.3
[6c6a2e73] Scratch v1.2.1
[91c51154] SentinelArrays v1.4.5
[efcf1570] Setfield v1.1.1
[605ecd9f] ShowCases v0.1.0
[699a6c99] SimpleTraits v0.9.4
[a2af1166] SortingAlgorithms v1.2.1
[dc90abb0] SparseInverseSubset v0.1.2
[0a514795] SparseMatrixColorings v0.3.5
[276daf66] SpecialFunctions v2.4.0
[171d559e] SplittablesBase v0.1.15
[90137ffa] StaticArrays v1.9.7
[1e83bf80] StaticArraysCore v1.4.3
[82ae8749] StatsAPI v1.7.0
[2913bbd2] StatsBase v0.34.3
[892a3eda] StringManipulation v0.3.4
[09ab397b] StructArrays v0.6.18
[53d494c1] StructIO v0.3.0
[3783bdb8] TableTraits v1.0.1
[bd369af6] Tables v1.12.0
[07d77754] Tapir v0.2.24
[a759f4b9] TimerOutputs v0.5.24
[9f7883ad] Tracker v0.2.34
[28d57a85] Transducers v0.4.82
[d265eb64] TypedSyntax v1.3.1
[013be700] UnsafeAtomics v0.2.1
[d80eeb9a] UnsafeAtomicsLLVM v0.1.5
[b8c1c048] WidthLimitedIO v1.0.1
[e88e6eb3] Zygote v0.6.70
[700de1a5] ZygoteRules v0.2.5
[02a925ec] cuDNN v1.3.2
[4ee394cb] CUDA_Driver_jll v0.9.1+1
[76a88914] CUDA_Runtime_jll v0.14.1+0
⌅ [62b44479] CUDNN_jll v9.0.0+1
⌅ [7cc45869] Enzyme_jll v0.0.134+0
[9c1d0b0a] JuliaNVTXCallbacks_jll v0.2.1+0
⌅ [dad2f222] LLVMExtra_jll v0.0.29+0
[e98f9f5b] NVTX_jll v3.1.0+2
[efe28fd5] OpenSpecFun_jll v0.5.5+0
[0dad84c5] ArgTools v1.1.1
[56f22d72] Artifacts
[2a0f44e3] Base64
[ade2ca70] Dates
[8ba89e20] Distributed
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[4af54fe1] LazyArtifacts
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.10.0
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays v1.10.0
[10745b16] Statistics v1.10.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.4.0+0
[e37daf67] LibGit2_jll v1.6.4+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.2+1
[14a3606d] MozillaCACerts_jll v2023.1.10
[4536629a] OpenBLAS_jll v0.3.23+4
[05823500] OpenLibm_jll v0.8.1+2
[bea87d4a] SuiteSparse_jll v7.2.1+1
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.8.0+1
[8e850ede] nghttp2_jll v1.52.0+1
[3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m`
As discussed on slack, I am sharing an MWE to reproduce the issues with some backends on the GPU that I hit. I believe that most of these failures are very representative of the failures that I have seen also when using the low-level API of each backend so I believe that, largely, these are not DI's fault. But as suggested, I am sharing these anyway:
MWE
Program Output
https://pastebin.com/9kf9qqhG
Version Info
This output is what I get from using the latest registered version of all backends in combination with JuliaGPU/CUDA.jl#2422 to facilitate Enzyme reverse diff on the GPU.
Click to see output of `]status -m `