Avoid two-layer Jacobian concatenation for non-batchable backends

As evidenced by the benchmarks below, `mapreduce(hcat, ...)` has serious allocation overhead. Replacing it with `stack` when the batch size is 1 could be beneficial.

<details>

```julia
using BenchmarkTools
using DifferentiationInterface
using Mooncake: Mooncake
using ForwardDiff: ForwardDiff

f(x) = map(cos, x);
x = ones(1000);
J = similar(x, length(x), length(x));

prep_mooncake_forward = prepare_jacobian(f, AutoMooncakeForward(), x);
prep_forwarddiff1 = prepare_jacobian(f, AutoForwardDiff(; chunksize=1), x);

@btime jacobian($f, $prep_mooncake_forward, AutoMooncakeForward(), $x);  # 220 ms
@btime jacobian($f, $prep_forwarddiff1, AutoForwardDiff(; chunksize=1), $x);  # 8 ms

@btime jacobian!($f, $J, $prep_mooncake_forward, AutoMooncakeForward(), $x);  # 9 ms
@btime jacobian!($f, $J, $prep_forwarddiff1, AutoForwardDiff(; chunksize=1), $x);  # 8 ms
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid two-layer Jacobian concatenation for non-batchable backends #874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid two-layer Jacobian concatenation for non-batchable backends #874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions