Fix issue #15: Mechanize discards first URL after self-closing anchor tag#405
Draft
Fix issue #15: Mechanize discards first URL after self-closing anchor tag#405
Conversation
Co-authored-by: oalders <96205+oalders@users.noreply.github.com>
Co-authored-by: oalders <96205+oalders@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix Mechanize issue with discarding first URL after anchor tag
Fix issue #15: Mechanize discards first URL after self-closing anchor tag
Oct 24, 2025
Member
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When WWW::Mechanize encountered a self-closing anchor tag like
<a name="anchor"/>, it would discard the first link that appeared immediately after it. This was originally reported in 2007 as issue #15 via RT.For example, given this HTML:
Before this fix,
mech-dump --linkswould only return:The first link (
test1) was completely missing.Root Cause
The
_link_from_token()method inlib/WWW/Mechanize.pmunconditionally called$parser->get_trimmed_text("/a")for all<a>tags to extract the link text.For self-closing tags like
<a name="anchor"/>, this call caused HTML::TokeParser to read forward until it found the next</a>closing tag. Unfortunately, that closing tag belonged to the subsequent link (test1), so the entire first link was consumed during text extraction and never processed.Solution
Modified
_link_from_token()to check if a tag is self-closing before callingget_trimmed_text(). HTML::TokeParser marks self-closing tags with a'/'key in the attributes hash. Self-closing tags have no content, so callingget_trimmed_text()is both unnecessary and causes this bug.The fix is minimal - just 4 lines with comments explaining the check.
Testing
t/anchor_name_bug.tthat reproduces the exact scenario from the original issue reportmech-dumpthat both links are now properly extractedAfter this fix,
mech-dump --linkscorrectly returns both links:Closes #15
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
blahblahblah.xx-only-testing.foo/usr/bin/perl t/local/failure.t(dns block)esm.ubuntu.com/usr/lib/apt/methods/https(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
Fixes #119
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.