When differentiating with respect to an empty array, the results tend to vary:
using DifferentiationInterface, ForwardDiff, ReverseDiff, Mooncake, Enzyme
ADTYPES = [
AutoForwardDiff(),
AutoReverseDiff(),
AutoMooncake(; config=nothing),
AutoEnzyme(; mode=Forward),
AutoEnzyme(; mode=Reverse),
# and more...
]
for adtype in ADTYPES
DifferentiationInterface.value_and_gradient(sum, adtype, Float64[])
end
ReverseDiff, Mooncake, and reverse Enzyme all happily return (0.0, []) 😄
Forward Enzyme tries to use a batch size of 0 and errors:
|
function DI.pick_batchsize(::AutoEnzyme, N::Integer) |
|
B = DI.reasonable_batchsize(N, 16) |
|
return DI.BatchSizeSettings{B}(N) |
|
end |
And ForwardDiff tries to construct a GradientResult which errors:
|
fc = DI.fix_tail(f, map(DI.unwrap, contexts)...) |
|
result = GradientResult(x) |
|
result = gradient!(result, fc, x) |
|
return DR.value(result), DR.gradient(result) |
https://github.com/JuliaDiff/DiffResults.jl/blob/fcf7858d393f0597fc74e195ed46f7bcbe5ff66c/src/DiffResults.jl#L64-L65
Funnily enough gradient with ForwardDiff (rather than value_and_gradient) is fine because it doesn't try to construct the GradientResult. I imagine the other operators would also have varying behaviour.
I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?
When differentiating with respect to an empty array, the results tend to vary:
ReverseDiff, Mooncake, and reverse Enzyme all happily return
(0.0, [])😄Forward Enzyme tries to use a batch size of 0 and errors:
DifferentiationInterface.jl/DifferentiationInterface/ext/DifferentiationInterfaceEnzymeExt/utils.jl
Lines 11 to 14 in 6a58124
And ForwardDiff tries to construct a
GradientResultwhich errors:DifferentiationInterface.jl/DifferentiationInterface/ext/DifferentiationInterfaceForwardDiffExt/onearg.jl
Lines 315 to 318 in 6a58124
https://github.com/JuliaDiff/DiffResults.jl/blob/fcf7858d393f0597fc74e195ed46f7bcbe5ff66c/src/DiffResults.jl#L64-L65
Funnily enough
gradientwith ForwardDiff (rather thanvalue_and_gradient) is fine because it doesn't try to construct theGradientResult. I imagine the other operators would also have varying behaviour.I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?