Problem
Ch.RowBinary currently encodes Variant by trying each candidate type and using exceptions for normal control flow:
# TODO find a better way than try/rescue
defp try_encode_variant([type | types], idx, value) do
try do
encode(type, value)
else
encoded -> [idx | encoded]
rescue
_e -> try_encode_variant(types, idx + 1, value)
end
end
This is especially expensive because Ch.Types canonicalizes Variant type order:
Ch.Types.decode("Variant(UInt64, String)")
# {:variant, [:string, :u64]}
So every integer value for Variant(UInt64, String) first attempts :string, raises, then succeeds as :u64.
Benchmark
Environment:
commit: 5c9244a
macOS, Apple M2, 8 GB RAM
Elixir 1.19.5, Erlang/OTP 28.3, JIT enabled
Benchee code:
alias Ch.RowBinary
variant_inputs = %{
"100k strings" => Enum.map(1..100_000, fn i -> ["event-#{i}"] end),
"100k u64s" => Enum.map(1..100_000, fn i -> [i] end)
}
Benchee.run(
%{
"encode Variant(UInt64,String)" => fn rows ->
RowBinary.encode_rows(rows, ["Variant(UInt64, String)"])
end
},
inputs: variant_inputs,
warmup: 1,
time: 2
)
Results:
##### With input 100k strings #####
Name ips average
encode Variant(UInt64,String) 168.76 5.93 ms
##### With input 100k u64s #####
Name ips average
encode Variant(UInt64,String) 0.0650 15.38 s
Toy no-exception ceiling check:
alias Ch.RowBinary
rows = Enum.map(1..100_000, fn i -> [i] end)
fast = fn rows ->
Enum.map(rows, fn [i] -> [1 | RowBinary.encode(:u64, i)] end)
end
{t1, _} = :timer.tc(fn ->
RowBinary.encode_rows(rows, ["Variant(UInt64, String)"])
end)
{t2, _} = :timer.tc(fn ->
fast.(rows)
end)
IO.inspect(%{current_ms: t1 / 1000, no_exception_ms: t2 / 1000, speedup: t1 / t2})
Results:
%{current_ms: 15243.01, no_exception_ms: 57.528, speedup: 264.9667987762481}
The toy path is not a proposed generic implementation; it just shows how much time is being lost to exception-driven dispatch for this common case.
Suggested direction
Replace exception-driven probing with a cheap compatibility predicate before calling encode/2, for example:
- scalar guard checks for strings, integers, floats, booleans, dates, times, UUID binaries/strings, IP tuples/binaries;
- recursive checks for arrays, maps, tuples, nullable;
- decimal and enum checks that preserve the current validation semantics.
Then encode with the first compatible canonical variant type.
Tests to add:
Variant(UInt64, String) integer encoding does not raise internally.
- Existing
Variant(UInt64, String, Array(UInt64)) integration behavior remains unchanged.
- Failure still raises a useful
ArgumentError when no variant type matches.
Problem
Ch.RowBinarycurrently encodesVariantby trying each candidate type and using exceptions for normal control flow:This is especially expensive because
Ch.TypescanonicalizesVarianttype order:So every integer value for
Variant(UInt64, String)first attempts:string, raises, then succeeds as:u64.Benchmark
Environment:
Benchee code:
Results:
Toy no-exception ceiling check:
Results:
The toy path is not a proposed generic implementation; it just shows how much time is being lost to exception-driven dispatch for this common case.
Suggested direction
Replace exception-driven probing with a cheap compatibility predicate before calling
encode/2, for example:Then encode with the first compatible canonical variant type.
Tests to add:
Variant(UInt64, String)integer encoding does not raise internally.Variant(UInt64, String, Array(UInt64))integration behavior remains unchanged.ArgumentErrorwhen no variant type matches.