Skip to content

Use non-blocking connect() for AF_UNIX sockets to ensure we don't stall#3456

Open
dougnazar wants to merge 1 commit into
networkupstools:masterfrom
dougnazar:enable_timeouts_connecting_to_unix_pipe
Open

Use non-blocking connect() for AF_UNIX sockets to ensure we don't stall#3456
dougnazar wants to merge 1 commit into
networkupstools:masterfrom
dougnazar:enable_timeouts_connecting_to_unix_pipe

Conversation

@dougnazar
Copy link
Copy Markdown
Contributor

If a driver has become unresponsive, connect() will block after the listen() backlog has been exhausted, causing the main upsd to also become unresponsive.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 31, 2026

A ZIP file with standard source tarball and another tarball with pre-built docs for commit dd83119 is temporarily available: NUT-tarballs-PR-3456.zip.

Comment thread common/common.c Fixed
If a driver has become unresponsive, connect() will block after the
listen() backlog has been exhausted, causing the main upsd to also
become unresponsive.

Signed-off-by: Doug Nazar <nazard@nazar.ca>
@dougnazar dougnazar force-pushed the enable_timeouts_connecting_to_unix_pipe branch from 43598e4 to dd83119 Compare May 31, 2026 12:02
@AppVeyorBot
Copy link
Copy Markdown

Build nut 2.8.5.4778-master completed (commit 6f088760f2 by @dougnazar)

@AppVeyorBot
Copy link
Copy Markdown

Build nut 2.8.5.4779-master completed (commit e4aea40382 by @dougnazar)

@jimklimov jimklimov requested review from aquette and clepple May 31, 2026 16:08
@jimklimov
Copy link
Copy Markdown
Member

This does look reasonable, maybe worth a mention in NEWS.adoc, maybe UPGRADING.adoc to (to highlight behavioral differences that might help or bite, comparing to older releases).

Just in case, CCing other maintainers for a second opinion ;)

@jimklimov jimklimov added feature Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time labels May 31, 2026
@jimklimov jimklimov added this to the 2.8.6 milestone May 31, 2026
@jimklimov
Copy link
Copy Markdown
Member

I wonder if the 1 second timeout here is right (hard-coded if I read that code correctly)? Maybe it should be a configurable value.

Specifically, what if the counterpart is busy or waiting between loop cycles, so will get to picking up the phone only in a few seconds? Did you test this, with various timing settings?

@dougnazar
Copy link
Copy Markdown
Contributor Author

I believe this will only be an issue once the listen() queue is filled, in which case I was thinking better to fail fast as this is in the main loop.

I've only extensively tested the failure mode on Linux (as that's where I have the failing driver), and in that case it never actually makes it to the select() call. Once in non-blocking mode the connect() call will change to returning EAGAIN if the listen() queue is full (and success if the queue is not full).

The rest was written with the help of various man pages (freebsd, aix, etc) and stack overflow answers, but I don't know that connect() will ever return an EINPROGRESS for an AF_UNIX socket.

@dougnazar
Copy link
Copy Markdown
Contributor Author

dougnazar commented May 31, 2026

Just for fun, ran a little test on a FreeBSD vm I had here. With a listen() backlog of 5, and then just sleeping, the client returned:

connect: 0 rc=0, errno=0
connect: 1 rc=0, errno=0
connect: 2 rc=0, errno=0
connect: 3 rc=0, errno=0
connect: 4 rc=0, errno=0
connect: 5 rc=0, errno=0
connect: 6 rc=0, errno=0
connect: 7 rc=0, errno=0
connect: 8 rc=-1, errno=61
connect: 9 rc=-1, errno=61

So on FreeBSD at least, connect() just errors with ECONNREFUSED.

Edit: AIX does the same, errno 79 ECONNREFUSED
Edit 2: Only linux requires O_NONBLOCK to have the connect() fail, but has no impact on FreeBSD or AIX.

@clepple clepple removed their request for review June 1, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Connection stability issues Issues about driver<->device and/or networked connections (upsd<->upsmon...) going AWOL over time feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants