perf: use getBytes for faster ASCII char narrowing in base64 encode#892
Closed
He-Pin wants to merge 1 commit into
Closed
perf: use getBytes for faster ASCII char narrowing in base64 encode#892He-Pin wants to merge 1 commit into
He-Pin wants to merge 1 commit into
Conversation
Replace per-char charAt(i).toByte loop with a single getBytes(0, len, dst, 0) system call for narrowing ASCII chars to bytes in encodeStringToString. Benchmark results (Scala Native vs jrsonnet): - std.base64 (string): 2.17x slower -> 1.21x slower (44% gap reduction) - std.base64Decode: 1.58x slower -> 1.47x slower (improved) - std.base64DecodeBytes: 1.19x faster -> 1.14x faster (stable) - std.base64_byte_array: 1.38x faster -> 1.31x faster (stable) The getBytes method is a single system call that copies bytes faster than a per-char loop. Zone is preserved for the output buffer to maintain allocation efficiency.
Contributor
Author
|
Some pending PR need to be submitted to Scala native. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
std.base64function for string input was 2.61x slower than jrsonnet (Rust implementation) on Scala Native. The main bottleneck was the per-charactercharAt(i).toByteloop used to narrow ASCII chars to bytes before passing to the aklomp/base64 SIMD encoder.Key Design Decision
Use
String.getBytes(0, len, dst, 0)system call instead of per-charactercharAt(i).toByteloop for ASCII char narrowing. This is a single system call that copies bytes faster than a per-char loop. Zone is preserved for the output buffer to maintain allocation efficiency.Modification
Changed
sjsonnet/src-native/sjsonnet/stdlib/PlatformBase64.scala:charAt(i).toByteloop withgetBytes(0, len, srcBytes, 0)system callArray[Byte]for source datamemcpyto copy from Array[Byte] to Zone buffer@nowarn("cat=deprecation")for deprecatedgetBytesmethodBenchmark Results
Scala Native vs jrsonnet (hyperfine)
JMH (JVM)
Baseline JMH benchmarks are stable; the change only affects Scala Native path.
Analysis
The
getBytes(0, len, dst, 0)method is a system call that copies bytes faster than a per-char loop. For a 3.5KB Lorem-ipsum-style input, this avoids two full passes over the data before the SIMD encoder sees it.The optimization is conservative: Zone is still used for the output buffer to maintain allocation efficiency. Only the source data preparation is optimized.
References
Result
./mill __.test)./mill __.reformat)