Use non-blocking connect() for AF_UNIX sockets to ensure we don't stall#3456
Use non-blocking connect() for AF_UNIX sockets to ensure we don't stall#3456dougnazar wants to merge 1 commit into
Conversation
|
A ZIP file with standard source tarball and another tarball with pre-built docs for commit dd83119 is temporarily available: NUT-tarballs-PR-3456.zip. |
If a driver has become unresponsive, connect() will block after the listen() backlog has been exhausted, causing the main upsd to also become unresponsive. Signed-off-by: Doug Nazar <nazard@nazar.ca>
43598e4 to
dd83119
Compare
|
✅ Build nut 2.8.5.4778-master completed (commit 6f088760f2 by @dougnazar)
|
|
✅ Build nut 2.8.5.4779-master completed (commit e4aea40382 by @dougnazar)
|
|
This does look reasonable, maybe worth a mention in Just in case, CCing other maintainers for a second opinion ;) |
|
I wonder if the 1 second timeout here is right (hard-coded if I read that code correctly)? Maybe it should be a configurable value. Specifically, what if the counterpart is busy or waiting between loop cycles, so will get to picking up the phone only in a few seconds? Did you test this, with various timing settings? |
|
I believe this will only be an issue once the listen() queue is filled, in which case I was thinking better to fail fast as this is in the main loop. I've only extensively tested the failure mode on Linux (as that's where I have the failing driver), and in that case it never actually makes it to the select() call. Once in non-blocking mode the connect() call will change to returning EAGAIN if the listen() queue is full (and success if the queue is not full). The rest was written with the help of various man pages (freebsd, aix, etc) and stack overflow answers, but I don't know that connect() will ever return an EINPROGRESS for an AF_UNIX socket. |
|
Just for fun, ran a little test on a FreeBSD vm I had here. With a listen() backlog of 5, and then just sleeping, the client returned: So on FreeBSD at least, connect() just errors with ECONNREFUSED. Edit: AIX does the same, errno 79 ECONNREFUSED |
If a driver has become unresponsive, connect() will block after the listen() backlog has been exhausted, causing the main upsd to also become unresponsive.