Skip to content

Conversation

@Leont
Copy link
Contributor

@Leont Leont commented Jul 27, 2025

Previously, the length operator on typemaps other than T_PV would lead to that length value not being initialized, leading to segfaults and worse. Worse yet, ExtUtils::ParseXS would silently emit this erroneous code.

For now it will at least give a clear error, in the future we should perhaps consider eliminating this limitation altogether.

  • This contains a changelog entry for ExtUtils::ParseXS instead of a perldelta

@Leont Leont requested a review from iabyn July 27, 2025 07:57
@Leont
Copy link
Contributor Author

Leont commented Jul 27, 2025

This one caused my a bunch of grief related to #23150 before I figured out what was going on.

@iabyn
Copy link
Contributor

iabyn commented Jul 27, 2025

It needs test(s) for the new error message. Other than that, I approve.

@jkeenan
Copy link
Contributor

jkeenan commented Jul 27, 2025

It needs test(s) for the new error message. Other than that, I approve.

Currently, in dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm's lookup_input_typemap(), there already exists one error condition at the point where the p.r. proposes to insert a new die. That existing die itself appears to be unexercised in the test suite.

$ ack 'default value not supported' dist/ExtUtils-ParseXS/
dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm
1411:            die "default value not supported with length(NAME) supplied"

Could we write tests for both conditions in this p.r.?

@Leont Leont force-pushed the parsexs-length-t_pv branch from bdf5a3f to e214089 Compare July 27, 2025 20:23
@Leont
Copy link
Contributor Author

Leont commented Jul 27, 2025

It needs test(s) for the new error message. Other than that, I approve.

Added that

@iabyn
Copy link
Contributor

iabyn commented Jul 28, 2025 via email

@Leont Leont force-pushed the parsexs-length-t_pv branch from e214089 to e1d0d2e Compare August 2, 2025 15:10
Previously, the length operator on typemaps other than T_PV would lead
to that length value not being initialized, leading to segfaults and
worse. Worse yet, parsexs would silently emit this erroneous code.

For now it will at least give a clear error, in the future we should
perhaps consider eliminating this limitation altogether.
@Leont Leont force-pushed the parsexs-length-t_pv branch from e1d0d2e to cd1bf15 Compare August 3, 2025 17:40
@Leont Leont merged commit a129015 into blead Aug 4, 2025
67 checks passed
@Leont
Copy link
Contributor Author

Leont commented Sep 10, 2025

Apparently this broke HarfBuzz-Shaper. It has its own typemap working around the uninitialized issue.

I guess this is another argument in favor of utf8/byte markers on string arguments.

@jkeenan
Copy link
Contributor

jkeenan commented Sep 10, 2025

Apparently this broke HarfBuzz-Shaper. It has its own typemap working around the uninitialized issue.

I guess this is another argument in favor of utf8/byte markers on string arguments.

@Leont could you please open a new BBC ticket so that we can track this? Thanks.

@iabyn
Copy link
Contributor

iabyn commented Sep 11, 2025 via email

@Leont
Copy link
Contributor Author

Leont commented Sep 11, 2025

Regardless of whether it should have become an error, I think the code was wrong: the 'default' test should die any time length(s) is used, not just when its T_PV.

I'm not following you there.

Perhaps we should refine the code to;

if (has_length) {
  unless ($typemap =~ /STRLEN_length_of_\$var/) {
    die unless $xstype eq 'T_PV';
    $typemap = '($type)SvPV($arg, STRLEN_length_of_$var);'
  }
  die if $default;
}

Which boils down to to: either supply the code which initialises STRLEN_length_of_s, or we'll supply it for you - but we only know how to do that with T_PV.

Yeah that sounds sensible.

@iabyn
Copy link
Contributor

iabyn commented Sep 11, 2025 via email

@iabyn
Copy link
Contributor

iabyn commented Sep 12, 2025 via email

@Leont
Copy link
Contributor Author

Leont commented Sep 13, 2025

In general, we initialise the STRLEN_length_of_s variable using SvCUR()

SvCUR is only defined if the value in SvPOK, so I guess it'd have to be sv_len, but otherwise that sounds fine to me.

@iabyn
Copy link
Contributor

iabyn commented Sep 14, 2025 via email

@jkeenan
Copy link
Contributor

jkeenan commented Sep 14, 2025

Ok. So how about we revert your 'croak on T_PV' change immediately to fix the CPAN breakage, then I'll add this thread to my list of things to change in XS for 5.44 ?

Are we tracking CPAN breakage from that change? My mind is drawing a blank here. If so, which ticket? Thanks.

@iabyn
Copy link
Contributor

iabyn commented Sep 24, 2025 via email

iabyn added a commit that referenced this pull request Jan 7, 2026
In something like

    int
    foo(char *s, int length(s))

the XS parser has been sort of assuming that the type of s, e.g.
'char *', always maps to T_PV.

If this is the case, then the typemap entry which would normally be
used, i.e.

    $var = ($type)SvPV_nolen($arg)

is discarded, and a hard-coded entry is used instead:

    ($type)SvPV($arg, STRLEN_length_of_$var);

(with the fields being populated directly rather than via the standard
typemap template expansion route).

This goes horribly wrong if the type of s doesn't map to T_PV. Before
this commit, the parser just silently used the standard template. This
meant that STRLEN_length_of_s didn't get initialised, and SEGVs ensued.

It also didn't work well if the XS code tried to override the standard
T_PV INPUT template.

Following this commit, the parser doesn't care what T_FOO the string
variable's type maps to; instead it just tries to modify the current
typemap template to be suitable for setting the string length too. The
new rules are:

* If the template already contains 'STRLEN_length_of_$var', use it
  unmodified; the assumption is that some XS author has been playing fast
  and loose with the implementation and knows what they are doing.

* If the template looks like

  ...  SvPV..._nolen...($arg) ...

  then modify it to the following (i.e. strip out the _nolen and add an
  arg):

  ...  SvPV......($arg, STRLEN_length_of_$var) ...

  and allow the normal template processing and expansion to proceed.

  I.e. modify anything which looks like an SvPV_nolen() variant,
  including SvPVutf8_nolen(), SvPV_nolen_const() etc.

* Otherwise die, with a long hint message explaining why the template
  couldn't be modified.

The original issue, with a rejected fix and a discussion which
ultimately led to this commit, can be found in PR #23479.
iabyn added a commit that referenced this pull request Jan 9, 2026
In something like

    int
    foo(char *s, int length(s))

the XS parser has been sort of assuming that the type of s, e.g.
'char *', always maps to T_PV.

If this is the case, then the typemap entry which would normally be
used, i.e.

    $var = ($type)SvPV_nolen($arg)

is discarded, and a hard-coded entry is used instead:

    ($type)SvPV($arg, STRLEN_length_of_$var);

(with the fields being populated directly rather than via the standard
typemap template expansion route).

This goes horribly wrong if the type of s doesn't map to T_PV. Before
this commit, the parser just silently used the standard template. This
meant that STRLEN_length_of_s didn't get initialised, and SEGVs ensued.

It also didn't work well if the XS code tried to override the standard
T_PV INPUT template.

Following this commit, the parser doesn't care what T_FOO the string
variable's type maps to; instead it just tries to modify the current
typemap template to be suitable for setting the string length too. The
new rules are:

* If the template already contains 'STRLEN_length_of_$var', use it
  unmodified; the assumption is that some XS author has been playing fast
  and loose with the implementation and knows what they are doing.

* If the template looks like

  ...  SvPV..._nolen...($arg) ...

  then modify it to the following (i.e. strip out the _nolen and add an
  arg):

  ...  SvPV......($arg, STRLEN_length_of_$var) ...

  and allow the normal template processing and expansion to proceed.

  I.e. modify anything which looks like an SvPV_nolen() variant,
  including SvPVutf8_nolen(), SvPV_nolen_const() etc.

* Otherwise die, with a long hint message explaining why the template
  couldn't be modified.

The original issue, with a rejected fix and a discussion which
ultimately led to this commit, can be found in PR #23479.
@iabyn
Copy link
Contributor

iabyn commented Jan 9, 2026

Note that my PR #24062 (currently awaiting review and merging) contains an implementation of my last suggestion above from 24 Sep 2025 to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants