Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 30 additions & 19 deletions adafruit_minimqtt/adafruit_minimqtt.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def __init__( # noqa: PLR0915, PLR0913, Too many statements, Too many arguments
self._lw_retain = False

# List of subscribed topics, used for tracking
Comment thread
BMDan marked this conversation as resolved.
Outdated
self._subscribed_topics: List[str] = []
self._subscribed_topics: List[tuple[str, int]] = []
self._on_message_filtered = MQTTMatcher()

# Default topic callback methods
Expand Down Expand Up @@ -837,7 +837,7 @@ def subscribe( # noqa: PLR0912, PLR0915, Too many branches, Too many statements
for t, q in topics:
if self.on_subscribe is not None:
self.on_subscribe(self, self.user_data, t, q)
self._subscribed_topics.append(t)
self._subscribed_topics.append((t, q))

return

Expand Down Expand Up @@ -866,7 +866,7 @@ def unsubscribe( # noqa: PLR0912, Too many branches
self._valid_topic(t)
topics.append(t)
for t in topics:
if t not in self._subscribed_topics:
if t not in [_t for _t, _ in self._subscribed_topics]:
raise MMQTTStateError("Topic must be subscribed to before attempting unsubscribe.")
# Assemble packet
self.logger.debug("Sending UNSUBSCRIBE to broker...")
Expand Down Expand Up @@ -907,7 +907,8 @@ def unsubscribe( # noqa: PLR0912, Too many branches
for t in topics:
if self.on_unsubscribe is not None:
self.on_unsubscribe(self, self.user_data, t, self._pid)
self._subscribed_topics.remove(t)
_, q = [(_t, _q) for _t, _q in self._subscribed_topics if _t == t][0]
self._subscribed_topics.remove((t, q))
return
if op != MQTT_PUBLISH:
# [3.10.4] The Server may continue to deliver existing messages buffered
Expand Down Expand Up @@ -958,21 +959,31 @@ def reconnect(self, resub_topics: bool = True) -> int:

self.logger.debug("Attempting to reconnect with MQTT broker")
subscribed_topics = []
if self.is_connected():
# disconnect() will reset subscribed topics so stash them now.
if resub_topics:
subscribed_topics = self._subscribed_topics.copy()
self.disconnect()

ret = self.connect(session_id=self.session_id)
self.logger.debug("Reconnected with broker")

if resub_topics and subscribed_topics:
self.logger.debug("Attempting to resubscribe to previously subscribed topics.")
self._subscribed_topics = []
while subscribed_topics:
feed = subscribed_topics.pop()
self.subscribe(feed)

# disconnect() will reset subscribed topics, so stash them now.
if resub_topics:
subscribed_topics = self._subscribed_topics.copy()

try:
if self.is_connected():
self.disconnect()

ret = self.connect(session_id=self.session_id)
self.logger.debug("Reconnected with broker")

if resub_topics and subscribed_topics:
self.logger.debug("Attempting to resubscribe to previously subscribed topics.")
self._subscribed_topics = []
while subscribed_topics:
feed = subscribed_topics.pop()
self.subscribe(*feed)
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the broad exception could be reduced to the MQTT exception ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, no matter what the exception is, we re-raise it (see the bare raise a few lines below). This is the moral equivalent of a finally or defer clause; we aren't masking nor handling the exception, merely pausing its propagation long enough to make sure our object is left in a sane state. In fact, we could do it with finally, if you'd prefer.

As I see it, there are three options here, in addition to what I've implemented:

  1. Track _original_subscriptions (or _remaining_original_subscriptions) in the object. This costs some space and complexity, but isn't otherwise too terrible. It does introduce an edge case where we would potentially subscribe to a topic twice, but that's probably not too awful.
  2. Narrow the scope of the exception being caught. This reduces the likelihood that we stomp on someone else's toe, but reintroduces the risk that an error that is not within our caught scope (even something so prosaic as an IndexError arising partway through a re-subscription) could cause us to violate our API contract and not fully re-subscribe upon reconnect.
  3. We could leave it to the caller to identify this situation. This feels like the worst option; it requires the caller either to issue spurious subscribe()s, or to look at our private class vars (_subscribed_topics). Plus, it means our guarantee of re-subscription upon reconnect cannot be relied upon.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed information about the thought process, really appreciated.

I think the key question here is whether anything besides MMQTTException being thrownraised from the depths of the library code is expected to be recoverable (in general and also w.r.t. the internal MQTT object state). My take on this is that if there is, it should be wrapped in MMQTTException, i.e. I do not see the need for the broad exception catch.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thank you for laying that out so clearly!

To work out the best path, I think it's helpful to have a real scenario. One of the lines within the try-catch is:

        self.logger.debug("Reconnected with broker")

Let us imagine that a custom global logging handler has a PotM bug, and that an exception will be raised from self.logger.debug if it is called at 4:56 A.M. on any Tuesday in March of 2026.

I put myself in the position of an engineer (who doesn't control the MiniMQTT library) who became aware of this bug when it triggered last week, but doesn't quite know how to reproduce it yet. Helpfully, the custom logger handler throws a corresponding CustomLoggingExceptions, and my kernel's a nice, clean loop, so I can do something like:

current_state = State()

while True:
    try:
        current_state.run_main_loop_once()
    except CustomLoggingException as e:
        # upload lots of debugging info, then...
        pass

And inside of run_main_loop_once, we already had something like:

try:
    mqtt_client.ping()
except MMQTTException:
    # Per docs for MMQTTException, "In general, the robust way to recover is to call reconnect()."
    mqtt_client.reconnect()

Perfect! Now I've got resilient code that won't crash in the face of a CLE, but will give me lots of debugging info.

Problem is, and we happen to fail our ping right around 4:55 A.M., and the time ticking over to 4:56 A.M. happens to occur partway through the reconnect loop, and if the next debug that gets called happens to be the one inside of the MiniMQTT library's reconnect(), then when I resume after that CLE, I will only be subscribed to a fraction of my topics. As you can see, that's quite a difficult scenario to debug.

However, I think the bigger issue is that, if that bug is found a different way that doesn't result in a partial re-subscription, then even given a very skilled programmer who is tasked with working around that bug (and, let us say, is somehow prevented from directly addressing the bug itself), their solution almost certainly would rely on reconnect()'s apparent semantics and thus would introduce a new, far more subtle bug that's incredibly difficult to reproduce. Indeed, even given an omniscient programmer who foresaw how reconnect() would be affected, their only options to handle it cleanly are:

  1. Reach inside of their client to query _subscribed_topics and compare that to a locally-kept complete list, or
  2. Tear down their client entirely and rebuild it from scratch any time an error occurs in an MQTT function.

Both of these require a lot more (branching) code, and both carry costs in at least two of the three categories of CPU, memory, and/or network traffic. Further, they require a level of defensive coding that seems unreasonable to expect from a consumer of this library.

Put simply, our API contract isn't supposed to require this sort of legwork from our upstream consumer. They were told that the resub_topics parameter worked in a particular way. I think it'd therefore be a bug if reconnect(resub_topics=True) didn't result in a full resubscription if called twice, even if the first call threw an exception of some kind, so long as the second of those calls succeeded. The whole idea of "reconnect and resubscribe" is to restore a known-good state. Let's do that.

# Overly-broad exception to address #253; if we're about to fail, make sure that we
# leave a full list of subscribed topics in our class so that we'll properly resub
# on the next retry.
if sorted(self._subscribed_topics) != sorted(subscribed_topics):
self._subscribed_topics = subscribed_topics
raise

return ret

Expand Down
7 changes: 5 additions & 2 deletions tests/test_unsubscribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,12 @@ def test_unsubscribe(topic, to_send, exp_recv) -> None:
mqtt_client.logger = logger

if isinstance(topic, str):
mqtt_client._subscribed_topics = [topic]
mqtt_client._subscribed_topics = [(topic, 1)]
elif isinstance(topic, list):
mqtt_client._subscribed_topics = topic
if topic and isinstance(topic[0], tuple):
mqtt_client._subscribed_topics = topic
else:
mqtt_client._subscribed_topics = [(t, 1) for t in topic]

logger.info(f"unsubscribing from {topic}")
mqtt_client.unsubscribe(topic)
Expand Down