Skip to content

Conversation

Copy link

Copilot AI commented Oct 24, 2025

Problem

find_all_inputs() and find_all_submits() were not finding all inputs when HTML contained nested forms or premature </form> tags. This is a long-standing issue (Google Code issue #14) where inputs appearing after a premature form closure were invisible to WWW::Mechanize.

Example scenario:

# HTML with nested form (invalid but common in the wild)
<form action="/search" name="search_form">
    <input type="text" name="query" />
    <input type="submit" name="search" />
    
    <form action="/other" name="nested_form">
        <input type="hidden" name="temp" />
    </form>
    
    <!-- These inputs are "orphaned" after the first </form> -->
    <input type="hidden" name="page" value="1" />
    <input type="submit" name="next" value="Next" />
</form>

Before this fix:

my @submits = $mech->find_all_submits();  # Only finds 'search'
$mech->click_button(name => 'next');      # FAILS - button not found

Root Cause

HTML::Form (which WWW::Mechanize uses) stops parsing inputs when it encounters the first </form> tag. With nested forms or malformed HTML, this tag may belong to an inner/nested form, causing HTML::Form to stop prematurely and miss subsequent inputs.

Solution

Enhanced _extract_forms() to detect and recover orphaned inputs:

  1. After HTML::Form parses forms normally, a new _attach_orphaned_inputs() method re-scans the HTML
  2. It uses HTML::TokeParser to find input elements that appear outside closed forms
  3. These orphaned inputs are attached to the appropriate form object

After this fix:

my @submits = $mech->find_all_submits();  # Finds both 'search' and 'next'
$mech->click_button(name => 'next');      # SUCCESS

Changes

  • Enhanced _extract_forms() to call _attach_orphaned_inputs() after HTML::Form parsing
  • Added _attach_orphaned_inputs() method to detect and attach orphaned inputs
  • Added comprehensive test coverage in t/orphaned_inputs.t (14 tests)

Testing

  • ✅ All existing tests pass
  • ✅ New tests cover nested forms, explicit premature closures, and normal forms
  • ✅ No security vulnerabilities introduced (CodeQL analysis clean)
  • ✅ Backward compatible - only affects malformed HTML cases

Related Issues

Fixes #14 (WM: $mech->find_all_inputs (and $mech->find_all_submits) doesn't)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • blahblahblah.xx-only-testing.foo
    • Triggering command: /usr/bin/perl t/local/failure.t (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>WM: $mech->find_all_inputs (and $mech->find_all_submits) doesn't</issue_title>
<issue_description>[email protected] reported on Nov 9, 2007

From http://rt.cpan.org/Ticket/Display.html?id=29317

Attached is an example of this failure. I was orginally using v1.20,
using $mech->inputs to try and find why a 'next' button couldn't be
found. Have now upgraded to 1.30 and added the $mech->find_all_inputs
(find_all_submits has the same fault), and it's failing to find all the
imputs/submits for this particular form.

If you look at the view source for the page I'm looking at
(http://vienna.yapceurope.org/ye2007/search), you'll see that there is
only one form, and that it terminates at the end of the page with a
further hidden field and a second submit button. On subsequent pages
there is another hidden field and a third submit button. Unfortunately
all the above functions failed to find any of the extra inputs/submits,
such that trying to submit the form with the named button fails.

I'm guessing that because there is other HTML code between the two, a
closing tag within that form also closes the form automatically, even
though it is still nested hasn't been closed.


#!/usr/bin/perl -w
use strict;

#############################################################################
#Library Modules #
#############################################################################

use WWW::Mechanize;

#############################################################################
#Global Variables #
#############################################################################

my $BASE = 'http://vienna.yapceurope.org/ye2007/search';

my $mech = WWW::Mechanize->new();
$mech->agent_alias( 'Linux Mozilla' );

#############################################################################
#Code Block #
#############################################################################

$mech->get( $BASE );

while($mech->success()) {
my @forms = $mech->forms;
for my $form (@forms) {
print STDERR "FORM: ".$form->attr( 'name' ).'/'.$form->attr( 'id' )."\n";
my @inputs = $form->inputs;
for my $input (@inputs) {
print STDERR " INPUT: ".$input->type().'/'.$input->name()."\n";
}
}

my @submits = $mech->find_all_submits();
for my $submit (@submits) {
print STDERR " INPUT: ".$submit->type().'/'.$submit->name()."\n";
}

$mech->click_button( name => 'next' );
}


 

Details

Imported from Google Code issue 14 via archive

Comments

[email protected] commented on Nov 13, 2007 :

(No comment was entered for this change.)
  • Summary : WM: $mech->find_all_inputs (and $mech->find_all_submits) doesn't

[email protected] commented on Apr 24, 2011 :

(No comment was entered for this change.)
  • Labels : WM

</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #118


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Enhanced _extract_forms() to detect and attach orphaned inputs
- Added _attach_orphaned_inputs() method to find inputs after premature </form> tags
- Handles nested forms and explicit premature form closing
- Added comprehensive tests in t/orphaned_inputs.t
- All existing tests still pass

Co-authored-by: oalders <[email protected]>
Copilot AI changed the title [WIP] Fix $mech->find_all_inputs and $mech->find_all_submits failure Fix find_all_inputs() and find_all_submits() to handle orphaned inputs in malformed HTML Oct 24, 2025
Copilot AI requested a review from oalders October 24, 2025 04:31
Copilot finished work on behalf of oalders October 24, 2025 04:31
@haarg
Copy link
Member

haarg commented Oct 24, 2025

HTML::Form seems to be doing the correct thing. The orphaned inputs are not part of the outer form, as form elements are not allowed to be nested. The initial form is terminated when the inner form starts. Attaching the later inputs to the form is not correct - they are not part of any form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WM: $mech->find_all_inputs (and $mech->find_all_submits) doesn't

3 participants