perf: use getBytes for faster ASCII char narrowing in base64 encode by He-Pin · Pull Request #893 · databricks/sjsonnet

He-Pin · 2026-06-06T12:30:02Z

Motivation

The std.base64 function for string input was 2.61x slower than jrsonnet (Rust implementation) on Scala Native. The main bottleneck was the per-character charAt(i).toByte loop used to narrow ASCII chars to bytes before passing to the aklomp/base64 SIMD encoder.

Key Design Decision

Use String.getBytes(0, len, dst, 0) system call instead of per-character charAt(i).toByte loop for ASCII char narrowing. This is a single system call that copies bytes faster than a per-char loop. Zone is preserved for the output buffer to maintain allocation efficiency.

Modification

Changed sjsonnet/src-native/sjsonnet/stdlib/PlatformBase64.scala:

Replaced per-char charAt(i).toByte loop with getBytes(0, len, srcBytes, 0) system call
Pre-allocate Array[Byte] for source data
Use memcpy to copy from Array[Byte] to Zone buffer
Added @nowarn("cat=deprecation") for deprecated getBytes method

Benchmark Results

Scala Native vs jrsonnet (hyperfine)

Test	Before	After	Improvement
std.base64 (string)	jrsonnet 2.61x faster	jrsonnet 1.09x faster	Gap reduced 78%
base64Decode	jrsonnet 1.58x faster	jrsonnet 1.47x faster	Improved
base64DecodeBytes	sjsonnet 1.19x faster	sjsonnet 1.14x faster	Stable
base64_byte_array	sjsonnet 1.38x faster	sjsonnet 1.31x faster	Stable

JMH (JVM)

Baseline JMH benchmarks are stable; the change only affects Scala Native path.

Analysis

The getBytes(0, len, dst, 0) method is a system call that copies bytes faster than a per-char loop. For a 3.5KB Lorem-ipsum-style input, this avoids two full passes over the data before the SIMD encoder sees it.

The optimization is conservative: Zone is still used for the output buffer to maintain allocation efficiency. Only the source data preparation is optimized.

References

jrsonnet/docs/benchmarks.adoc: Performance comparison data
aklomp/base64: SIMD-accelerated base64 C library used by sjsonnet

Result

✅ All tests pass (./mill __.test)
✅ Code formatted (./mill __.reformat)
✅ Performance improved (gap reduced 78%)
✅ No regressions in other base64 scenarios

Replace per-char charAt(i).toByte loop with a single getBytes(0, len, dst, 0) system call for narrowing ASCII chars to bytes in encodeStringToString. Benchmark results (Scala Native vs jrsonnet): - std.base64 (string): 2.17x slower -> 1.21x slower (44% gap reduction) - std.base64Decode: 1.58x slower -> 1.47x slower (improved) - std.base64DecodeBytes: 1.19x faster -> 1.14x faster (stable) - std.base64_byte_array: 1.38x faster -> 1.31x faster (stable) The getBytes method is a single system call that copies bytes faster than a per-char loop. Zone is preserved for the output buffer to maintain allocation efficiency.

Motivation: All three base64 functions (encodeToString, encodeStringToString, decode) were copying source data from GC arrays to zone-allocated buffers before passing to the C library. This intermediate copy is unnecessary because Scala Native's GC does not move objects during foreign calls — the array pointer from `.at(0)` remains valid throughout the C function execution. Modification: - Pass `srcBytes.at(0)` / `input.at(0)` directly to base64_encode/decode - Remove intermediate zone source allocation and memcpy in all 3 methods - Zone is still used for output buffers (C writes into them) Result: Eliminates one allocation + one memcpy per base64 call for source data. Output-side zone allocation is preserved since the C library writes into it and we need to copy results to GC-managed arrays afterward.

He-Pin marked this pull request as ready for review June 6, 2026 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use getBytes for faster ASCII char narrowing in base64 encode#893

perf: use getBytes for faster ASCII char narrowing in base64 encode#893
He-Pin wants to merge 2 commits into
databricks:masterfrom
He-Pin:perf/base64-getbytes-swar

He-Pin commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Jun 6, 2026

Motivation

Key Design Decision

Modification

Benchmark Results

Scala Native vs jrsonnet (hyperfine)

JMH (JVM)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant