Skip to content

Avoid exception-driven RowBinary Variant encoding #368

@ruslandoga

Description

@ruslandoga

Problem

Ch.RowBinary currently encodes Variant by trying each candidate type and using exceptions for normal control flow:

# TODO find a better way than try/rescue
defp try_encode_variant([type | types], idx, value) do
  try do
    encode(type, value)
  else
    encoded -> [idx | encoded]
  rescue
    _e -> try_encode_variant(types, idx + 1, value)
  end
end

This is especially expensive because Ch.Types canonicalizes Variant type order:

Ch.Types.decode("Variant(UInt64, String)")
# {:variant, [:string, :u64]}

So every integer value for Variant(UInt64, String) first attempts :string, raises, then succeeds as :u64.

Benchmark

Environment:

commit: 5c9244a
macOS, Apple M2, 8 GB RAM
Elixir 1.19.5, Erlang/OTP 28.3, JIT enabled

Benchee code:

alias Ch.RowBinary

variant_inputs = %{
  "100k strings" => Enum.map(1..100_000, fn i -> ["event-#{i}"] end),
  "100k u64s" => Enum.map(1..100_000, fn i -> [i] end)
}

Benchee.run(
  %{
    "encode Variant(UInt64,String)" => fn rows ->
      RowBinary.encode_rows(rows, ["Variant(UInt64, String)"])
    end
  },
  inputs: variant_inputs,
  warmup: 1,
  time: 2
)

Results:

##### With input 100k strings #####
Name                                    ips        average
encode Variant(UInt64,String)        168.76        5.93 ms

##### With input 100k u64s #####
Name                                    ips        average
encode Variant(UInt64,String)        0.0650        15.38 s

Toy no-exception ceiling check:

alias Ch.RowBinary

rows = Enum.map(1..100_000, fn i -> [i] end)

fast = fn rows ->
  Enum.map(rows, fn [i] -> [1 | RowBinary.encode(:u64, i)] end)
end

{t1, _} = :timer.tc(fn ->
  RowBinary.encode_rows(rows, ["Variant(UInt64, String)"])
end)

{t2, _} = :timer.tc(fn ->
  fast.(rows)
end)

IO.inspect(%{current_ms: t1 / 1000, no_exception_ms: t2 / 1000, speedup: t1 / t2})

Results:

%{current_ms: 15243.01, no_exception_ms: 57.528, speedup: 264.9667987762481}

The toy path is not a proposed generic implementation; it just shows how much time is being lost to exception-driven dispatch for this common case.

Suggested direction

Replace exception-driven probing with a cheap compatibility predicate before calling encode/2, for example:

  • scalar guard checks for strings, integers, floats, booleans, dates, times, UUID binaries/strings, IP tuples/binaries;
  • recursive checks for arrays, maps, tuples, nullable;
  • decimal and enum checks that preserve the current validation semantics.

Then encode with the first compatible canonical variant type.

Tests to add:

  • Variant(UInt64, String) integer encoding does not raise internally.
  • Existing Variant(UInt64, String, Array(UInt64)) integration behavior remains unchanged.
  • Failure still raises a useful ArgumentError when no variant type matches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions