Skip to content

Feature/io uring#264

Merged
mvandeberg merged 4 commits into
cppalliance:developfrom
mvandeberg:feature/io_uring_perf
Jun 3, 2026
Merged

Feature/io uring#264
mvandeberg merged 4 commits into
cppalliance:developfrom
mvandeberg:feature/io_uring_perf

Conversation

@mvandeberg
Copy link
Copy Markdown
Contributor

No description provided.

@mvandeberg mvandeberg changed the title Feature/io uring perf Feature/io uring Jun 2, 2026
@cppalliance-bot
Copy link
Copy Markdown

cppalliance-bot commented Jun 2, 2026

An automated preview of the documentation is available at https://264.corosio.prtest3.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-06-02 22:23:37 UTC

sgerbino and others added 3 commits June 2, 2026 09:14
Add a speculative non-blocking syscall fast path to every socket op:
read_some / write_some / submit_send / submit_recv attempt ::readv /
::sendmsg / ::recvmsg before falling through to the io_uring submit
path. On success the op completes without a kernel round-trip; on
EAGAIN, the io_uring path runs unchanged. Speculative ::accept4 also
fires at the top of the multishot acceptor entry. Connect is left on
the io_uring path because IORING_OP_CONNECT re-invokes connect(2)
internally and a prior speculative ::connect leaves the fd in
EINPROGRESS → EALREADY.

Gate the speculative attempts on a per-socket per-op-type hint
(detail::speculative_state). The hint is flipped false when
speculation discovers an exhausted buffer (EAGAIN) and restored when
an io_uring CQE indicates kernel readiness (res > 0). Skips the
wasted speculative syscall when the kernel buffer is known empty /
full.

Embed the per-op slots (uring_read_op, uring_write_op,
uring_connect_op, uring_dgram_send_op, uring_dgram_recv_op, file
read/write ops) as members of each socket/file impl. Eliminates the
per-call heap allocation on the I/O hot path and gives the
speculative path stable storage to dispatch through (the embedded
cont_op is always there).

Batch deferred SQE submission via submit_sqes_op. The first
cross-thread io_uring_submit_op in a batch wins a CAS and posts a
single op that flushes the SQ ring; subsequent submitters in the
same batch piggyback on the same flush rather than each issuing
their own syscall.

Keep do_one's submit_and_get_events + process_completions prologue
so the kernel CQE pump runs on every dispatch iteration. A polling
timer with 0ns expiry keeps completed_ops_ non-empty and the leader-
phase kernel pass below it never runs without the prologue; CQEs
accumulate in the ring forever.

Misc liveness / safety:
- Cap the leader's unbounded kernel wait at 1s — defense in depth
  against a lost wakeup (multishot poll on wakeup_eventfd_ silently
  terminating).
- Align op destroy() with the reactor backend — do not touch the
  awaiter handle at shutdown; calling h.destroy() in op destroy()
  recurses through capy's promise dtor.
- Release ring_mutex_ across the leader's kernel wait so cross-thread
  submitters can prep new SQEs while the leader sleeps.
- Switch the wakeup poll SQE to multishot and force-wake
  unconditionally from interrupt_reactor in multi-thread mode (CAS-
  coalescing would drop wakes given the kernel waits indefinitely
  between CQEs).

The reactor backend still speculates unconditionally and uses iovec-
style syscalls; porting the speculative_state mixin and the single-
buffer fast path is future work.
- Drain expired timers at the top of do_one so stopper timers fire under continuous I/O and shutdown-deadlock socket_stress tests pass.
- Skip io_uring_submit_and_get_events in do_one when no SQEs are in flight, gated on an io_uring_inflight_ counter incremented at SQE submit and decremented on the terminal CQE.
- Defer the eager getsockname syscall on accepted TCP sockets to a three-state lazy-resolution scheme, so accept-heavy paths skip the round trip until local_endpoint() is observed.
- Place outstanding_work_ and io_uring_inflight_ on distinct cache lines via alignas(64) to eliminate false sharing on multi-thread workloads.
- Latch speculative reads permanently off after a consecutive-EAGAIN streak so structurally bursty workloads (e.g. fan_out:nested/16) stop burning a wasted readv syscall per read_some.
- Emit IORING_OP_RECV / IORING_OP_SEND on single-buffer reads and writes to skip the iovec-array indirection that IORING_OP_READV / IORING_OP_SENDMSG pays.
- Gate timer_service::process_expired() on timer_service::empty() so the unconditional timer drain added above is free (a single relaxed-acquire load) when no timer is registered.
- Add BOOST_COROSIO_BENCH_ASIO_IO_URING (default ON) so the asio bench variants build against io_uring by default for apples-to-apples comparison, and reconfigure with -DBOOST_COROSIO_BENCH_ASIO_IO_URING=OFF to revert to asio's epoll reactor without touching the source.
- Implement wait() on all six io_uring socket/acceptor types via a new uring_wait_op that emits IORING_OP_POLL_ADD with POLLIN / POLLOUT / POLLPRI|POLLERR|POLLHUP for wait_type::read / write / error.
- Add stream_file_type, stream_file_service_type, random_access_file_type, and random_access_file_service_type aliases to io_uring_t.
- Include the io_uring detail headers from the native_*.hpp tag-dispatch wrappers so they can instantiate against io_uring_t.
- Register reactor_paths.cpp for reactor backends only via a new COROSIO_REACTOR_BACKEND_TESTS macro: testWriteEAGAIN's small-buffer (SO_SNDBUF=1024) loopback pattern triggers a kernel-level slow-path in io_uring's POLLOUT-rearm cycle that exceeds reasonable ctest timeouts; io_uring socket coverage is preserved by the other test files.
@cppalliance-bot
Copy link
Copy Markdown

cppalliance-bot commented Jun 2, 2026

GCOVR code coverage report https://264.corosio.prtest3.cppalliance.org/gcovr/index.html
LCOV code coverage report https://264.corosio.prtest3.cppalliance.org/genhtml/index.html
Coverage Diff Report https://264.corosio.prtest3.cppalliance.org/diff-report/index.html

Build time: 2026-06-02 22:37:27 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.78%. Comparing base (1e591ce) to head (5be499f).
⚠️ Report is 2 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #264   +/-   ##
========================================
  Coverage    77.78%   77.78%           
========================================
  Files           96       96           
  Lines         7256     7256           
  Branches      1769     1769           
========================================
  Hits          5644     5644           
  Misses        1102     1102           
  Partials       510      510           
Files with missing lines Coverage Δ
include/boost/corosio/detail/intrusive.hpp 100.00% <ø> (ø)
include/boost/corosio/detail/scheduler.hpp 100.00% <ø> (ø)
include/boost/corosio/io_context.hpp 96.87% <ø> (ø)
...boost/corosio/native/detail/iocp/win_scheduler.hpp 62.02% <ø> (ø)
...sio/native/detail/posix/posix_resolver_service.hpp 81.46% <ø> (ø)
include/boost/corosio/native/native_io_context.hpp 94.59% <ø> (ø)
...clude/boost/corosio/native/native_tcp_acceptor.hpp 90.90% <ø> (ø)
include/boost/corosio/native/native_tcp_socket.hpp 90.41% <ø> (ø)
src/corosio/src/io_context.cpp 95.83% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e591ce...5be499f. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mvandeberg mvandeberg force-pushed the feature/io_uring_perf branch 8 times, most recently from e6ea2ae to fd418a3 Compare June 2, 2026 21:39
ci.yml: add liburing-dev to the apt-get list for the package-install
step. The step only runs on apt-based systems, so macOS / Windows /
FreeBSD entries are unaffected.

code-coverage.yml: add a dedicated install step before the coverage
script runs, so io_uring code paths are included in the Linux
coverage report.

io_uring: PUBLIC liburing link + clang-tidy fixes

b2: detect liburing and enable io_uring backend when present

test: register io_uring shadow tests for all native types

fix(io_context): drop unsafe scheduler downcasts

cmake: emit raw -luring for install consumers

Fix asan leaks
@mvandeberg mvandeberg force-pushed the feature/io_uring_perf branch from fd418a3 to 5be499f Compare June 2, 2026 22:17
@mvandeberg mvandeberg merged commit 7c636ac into cppalliance:develop Jun 3, 2026
41 checks passed
@mvandeberg mvandeberg deleted the feature/io_uring_perf branch June 3, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants