Context
Observed on AtomVM 0.7.0-beta.0+git.fd8127f1 (ESP32-S3 XIAO Sense), Elixir 1.19.5 / OTP 28, atomvm_httpd at aae97d1 (improvements branch).
Use-case: a camera webserver that serves live JPEG frames at ~100–160 kB each from PSRAM framebuffers over WiFi. This hammers the send path and surfaced several issues. Filing as one tracking issue with sub-bullets so they can be individually scheduled.
1. Demote "Connection closed mid-transfer" log to ?TRACE
try_send_binary/2 logs every client-initiated mid-stream close at io:format level:
{error, closed} ->
case byte_size(Rest) of
0 -> ok;
_ -> io:format("Connection closed mid-transfer (~p/~p bytes sent)~n", [ChunkSize, TotalSize])
end,
A browser canceling a request (image swap, tab close, navigation, or a parallel request racing ahead) is normal client behavior, not a server fault. On a page that chains JPEG requests back-to-back, a single slow WiFi connection produces dozens of these per minute and completely obscures real errors.
Suggestion: demote to ?TRACE, matching the surrounding tracing pattern in the file.
2. Set TCP_NODELAY on accepted connections
accept/2 does not set {tcp, nodelay} on accepted sockets. Nagle buffering holds small writes in lwIP's send buffer for up to ~40 ms while waiting for an ACK — which is exactly the wrong default for an HTTP server that sends a response head followed immediately by a binary body. The response head ends up queued until either an ACK arrives or the Nagle timer fires, adding latency to every response.
Suggestion: add socket:setopt(Connection, {tcp, nodelay}, true) immediately after socket:accept/1 (~line 323). This could be unconditional (HTTP is virtually always nodelay-friendly) or gated behind a socket_options map entry so callers can opt out.
3. Make MAX_SEND_CHUNK configurable; bump default from 1460 to 4096
gen_tcp_server.erl hardcodes ?MAX_SEND_CHUNK = 1460 (Ethernet TCP MSS). For a 100 kB JPEG this means ~70 successive socket:send calls each followed by a receive after 0 yield. That:
- multiplies NIF-crossing and lwIP queue overhead per response,
- widens the window for a client cancellation to land between chunks (contributing to issue 1),
- increases mailbox depth on the controlling gen_server during a long send.
WiFi handles its own fragmentation above the MTU; lwIP also segments internally. Larger per-send payloads are the norm in similar embedded HTTP stacks.
Suggestion:
- accept a
chunk_size key in the socket_options map passed to start/4 / start_link/4,
- default to 4096 (appropriate for ESP32 lwIP send-buffer headroom; callers on platforms with larger buffers can raise it).
4. Move request handling into per-connection worker processes
Currently the single controlling gen_server:
- owns the listen socket and connection registry,
- handles all requests (
handle_tcp_data → Handler:handle_http_req/2 → try_send).
While one client downloads a 100 kB response, every other client's request sits in the gen_server's mailbox. For a camera server with a browser issuing 4–6 parallel image loads, or any long-lived response (chunked transfer, SSE), this is a hard serialization bottleneck.
Suggestion: spawn a worker process at accept time that owns both the recv loop and request dispatch/send for its socket. The gen_server retains only listen-socket ownership and max_connections enforcement. Rough shape:
gen_server (listener)
├── accept/2 (spawned per listener)
└── worker (spawned per connection): recv → dispatch → send → loop
This is the more involved refactor in the list — filing separately so it can be planned on its own schedule.
All four are independent; they can be addressed in any order. Happy to submit PRs for any of them.
Context
Observed on AtomVM
0.7.0-beta.0+git.fd8127f1(ESP32-S3 XIAO Sense), Elixir 1.19.5 / OTP 28,atomvm_httpdataae97d1(improvementsbranch).Use-case: a camera webserver that serves live JPEG frames at ~100–160 kB each from PSRAM framebuffers over WiFi. This hammers the send path and surfaced several issues. Filing as one tracking issue with sub-bullets so they can be individually scheduled.
1. Demote "Connection closed mid-transfer" log to
?TRACEtry_send_binary/2logs every client-initiated mid-stream close atio:formatlevel:{error, closed} -> case byte_size(Rest) of 0 -> ok; _ -> io:format("Connection closed mid-transfer (~p/~p bytes sent)~n", [ChunkSize, TotalSize]) end,A browser canceling a request (image swap, tab close, navigation, or a parallel request racing ahead) is normal client behavior, not a server fault. On a page that chains JPEG requests back-to-back, a single slow WiFi connection produces dozens of these per minute and completely obscures real errors.
Suggestion: demote to
?TRACE, matching the surrounding tracing pattern in the file.2. Set
TCP_NODELAYon accepted connectionsaccept/2does not set{tcp, nodelay}on accepted sockets. Nagle buffering holds small writes in lwIP's send buffer for up to ~40 ms while waiting for an ACK — which is exactly the wrong default for an HTTP server that sends a response head followed immediately by a binary body. The response head ends up queued until either an ACK arrives or the Nagle timer fires, adding latency to every response.Suggestion: add
socket:setopt(Connection, {tcp, nodelay}, true)immediately aftersocket:accept/1(~line 323). This could be unconditional (HTTP is virtually always nodelay-friendly) or gated behind asocket_optionsmap entry so callers can opt out.3. Make
MAX_SEND_CHUNKconfigurable; bump default from 1460 to 4096gen_tcp_server.erlhardcodes?MAX_SEND_CHUNK = 1460(Ethernet TCP MSS). For a 100 kB JPEG this means ~70 successivesocket:sendcalls each followed by areceive after 0yield. That:WiFi handles its own fragmentation above the MTU; lwIP also segments internally. Larger per-
sendpayloads are the norm in similar embedded HTTP stacks.Suggestion:
chunk_sizekey in thesocket_optionsmap passed tostart/4/start_link/4,4. Move request handling into per-connection worker processes
Currently the single controlling gen_server:
handle_tcp_data→Handler:handle_http_req/2→try_send).While one client downloads a 100 kB response, every other client's request sits in the gen_server's mailbox. For a camera server with a browser issuing 4–6 parallel image loads, or any long-lived response (chunked transfer, SSE), this is a hard serialization bottleneck.
Suggestion: spawn a worker process at accept time that owns both the recv loop and request dispatch/send for its socket. The gen_server retains only listen-socket ownership and
max_connectionsenforcement. Rough shape:This is the more involved refactor in the list — filing separately so it can be planned on its own schedule.
All four are independent; they can be addressed in any order. Happy to submit PRs for any of them.