As evidenced by the benchmarks below, mapreduce(hcat, ...) has serious allocation overhead. Replacing it with stack when the batch size is 1 could be beneficial.
Details
using BenchmarkTools
using DifferentiationInterface
using Mooncake: Mooncake
using ForwardDiff: ForwardDiff
f(x) = map(cos, x);
x = ones(1000);
J = similar(x, length(x), length(x));
prep_mooncake_forward = prepare_jacobian(f, AutoMooncakeForward(), x);
prep_forwarddiff1 = prepare_jacobian(f, AutoForwardDiff(; chunksize=1), x);
@btime jacobian($f, $prep_mooncake_forward, AutoMooncakeForward(), $x); # 220 ms
@btime jacobian($f, $prep_forwarddiff1, AutoForwardDiff(; chunksize=1), $x); # 8 ms
@btime jacobian!($f, $J, $prep_mooncake_forward, AutoMooncakeForward(), $x); # 9 ms
@btime jacobian!($f, $J, $prep_forwarddiff1, AutoForwardDiff(; chunksize=1), $x); # 8 ms
As evidenced by the benchmarks below,
mapreduce(hcat, ...)has serious allocation overhead. Replacing it withstackwhen the batch size is 1 could be beneficial.Details