-
Notifications
You must be signed in to change notification settings - Fork 605
Explicitly reject length(NAME) with typemaps other than T_PV #23479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This one caused my a bunch of grief related to #23150 before I figured out what was going on. |
|
It needs test(s) for the new error message. Other than that, I approve. |
Currently, in dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm's Could we write tests for both conditions in this p.r.? |
bdf5a3f to
e214089
Compare
Added that |
|
On Sun, Jul 27, 2025 at 03:31:09AM -0700, James E Keenan wrote:
jkeenan left a comment (Perl/perl5#23479)
Currently, in dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS/Node.pm's `lookup_input_typemap()`, there already exists one error condition at the point where the p.r. proposes to insert a new `die`. That existing `die` itself appears to be unexercised in the test suite.
```
That's because that condition isn't (I think) ever reached. A similar
check+die is done earlier during parsing, and there *is* a test for that
error. I'll add a note to self to remove the redundant check at some
point.
…--
Modern art:
"That's easy, I could have done that!"
"Ah, but you didn't!"
|
e214089 to
e1d0d2e
Compare
Previously, the length operator on typemaps other than T_PV would lead to that length value not being initialized, leading to segfaults and worse. Worse yet, parsexs would silently emit this erroneous code. For now it will at least give a clear error, in the future we should perhaps consider eliminating this limitation altogether.
e1d0d2e to
cd1bf15
Compare
|
Apparently this broke HarfBuzz-Shaper. It has its own typemap working around the uninitialized issue. I guess this is another argument in favor of utf8/byte markers on string arguments. |
@Leont could you please open a new BBC ticket so that we can track this? Thanks. |
|
On Wed, Sep 10, 2025 at 10:04:37AM -0700, Leon Timmermans wrote:
Leont left a comment (Perl/perl5#23479)
Apparently this broke [HarfBuzz-Shaper](sciurius/perl-HarfBuzz-Shaper#10). It has its own [typemap](https://github.com/sciurius/perl-HarfBuzz-Shaper/blob/3b95bc6854fef50d6d9082ca3679022b00bbf998/typemap#L17) working around the uninitialized issue.
I guess this is another argument in favor of utf8/byte markers on string arguments.
The length(s) declaration prior to your T_PV change had two main effects:
1. Declare STRLEN_length_of_s, XSauto_length_of_s,
assign the value of STRLEN_length_of_s to XSauto_length_of_s
(thus bypassing any typcasing issues)
and pass XSauto_length_of_s as an argument to the wrapped function.
Note that STRLEN_length_of_s doesn't get assigned a value.
2. In addition, if the type of s mapped to T_PV, then:
a) override the typemap for s, and use
($type)SvPV($arg, STRLEN_length_of_$var);
instead (so STRLEN_length_of_$var gets assigned to)
b) complain if s has a default value
After your change, s mapping to other than T_PV became an error.
Regardless of whether it should have become an error, I think the code was
wrong: the 'default' test should die any time length(s) is used, not just
when its T_PV.
I think HarfBuzz-Shaper is doing a lot of unwarranted chumminess with the
XS compiler, given that both STRLEN_length_of_$var and
XSauto_length_of_$var are undocumented implementation details. (NB: in my
current perlxs.pod draft, I document XSauto_length_of_$var but not
STRLEN_length_of_$var.)
Perhaps we should refine the code to;
if (has_length) {
unless ($typemap =~ /STRLEN_length_of_\$var/) {
die unless $xstype eq 'T_PV';
$typemap = '($type)SvPV($arg, STRLEN_length_of_$var);'
}
die if $default;
}
Which boils down to to:
either supply the code which initialises STRLEN_length_of_s,
or we'll supply it for you - but we only know how to do that with
T_PV.
|
I'm not following you there.
Yeah that sounds sensible. |
|
On Thu, Sep 11, 2025 at 07:37:05AM -0700, Leon Timmermans wrote:
> Regardless of whether it should have become an error, I think the code
> was wrong: the 'default' test should die any time length(s) is used,
> not just when its T_PV.
I'm not following you there.
Sorry, I meant that even if your 'die unless T_PV' patch was backed out,
the existing code should be modified so that the 'die if $default' line
is executed *even* if not T_PV. So rather than
if (has_length and $xstype eq 'T_PV') {
die "can't have default value\n" if defined $default;
... override typemap ...
}
it should become:
if (has_length) {
die "can't have default value\n" if defined $default;
if ($xstype eq 'T_PV') {
... override typemap ...
}
}
|
|
On Thu, Sep 11, 2025 at 04:24:01AM -0700, David Mitchell wrote:
Which boils down to to:
either supply the code which initialises STRLEN_length_of_s,
or we'll supply it for you - but we only know how to do that with
T_PV.
Thinking further. We shouldn't rely on testing the XS type to be T_PV at
all: after all, the typemap template for T_PV may have been overridden.
Instead we should rely on what the template *looks like*. So the code
should (in principle) look like:
if (template contains STRLEN_length_of_$var) {
declare and initialise $var using template;
(we assume the template will have initialised STRLEN_length_of_$var)
}
else if (template looks like SvPV_nolen($arg)) {
replace it with SvPV($arg, STRLEN_length_of_$var);
declare and initialise $var using template;
}
else {
declare and initialise $var using template;
emit: STRLEN_length_of_$var = SvCUR($arg);
}
Or to put it another way:
In general, we initialise the STRLEN_length_of_s variable using SvCUR();
but if possible, we optimise that by setting it using a modified template
at the same time as getting the value of s (i.e. what the old T_PV test
used to do), or if the template already seems to set STRLEN_length_of_s
itself, don't bother.
|
|
|
On Sat, Sep 13, 2025 at 04:11:18PM -0700, Leon Timmermans wrote:
Leont left a comment (Perl/perl5#23479)
> In general, we initialise the STRLEN_length_of_s variable using SvCUR()
`SvCUR` is only defined if the value in `SvPOK`, so I guess it'd have to be `sv_len`, but otherwise that sounds fine to me.
Ok.
So how about we revert your 'croak on T_PV' change immediately to fix the
CPAN breakage, then I'll add this thread to my list of things to change in
XS for 5.44 ?
|
Are we tracking CPAN breakage from that change? My mind is drawing a blank here. If so, which ticket? Thanks. |
|
On Fri, Sep 12, 2025 at 02:04:31AM -0700, David Mitchell wrote:
Thinking further. We shouldn't rely on testing the XS type to be T_PV at
all: after all, the typemap template for T_PV may have been overridden.
Instead we should rely on what the template *looks like*. So the code
should (in principle) look like:
if (template contains STRLEN_length_of_$var) {
declare and initialise $var using template;
(we assume the template will have initialised STRLEN_length_of_$var)
}
else if (template looks like SvPV_nolen($arg)) {
replace it with SvPV($arg, STRLEN_length_of_$var);
declare and initialise $var using template;
}
else {
declare and initialise $var using template;
emit: STRLEN_length_of_$var = SvCUR($arg);
}
and Leon then pointed out that the SvCUR() should be sv_len().
But thinking further: the string pointer and length must be determined at
the *same* time; otherwise there are a few edge cases - overload springs
to mind - where the string buffer is set by a first call to "" overloading
and the length from a second call, which may not return the same buffer
nor length each time.
So I now propose:
if (template contains STRLEN_length_of_$var) {
declare and initialise $var using template;
(we assume the template will have initialised STRLEN_length_of_$var)
}
else if (template looks like SvPV_nolen($arg)) {
replace it with SvPV($arg, STRLEN_length_of_$var);
declare and initialise $var using template;
}
else {
croak("Can't determine length from template $foo");
}
That new croak is serving the same purpose as the recently reverted
"croak unless T_PV", but provides more flexibility.
|
In something like
int
foo(char *s, int length(s))
the XS parser has been sort of assuming that the type of s, e.g.
'char *', always maps to T_PV.
If this is the case, then the typemap entry which would normally be
used, i.e.
$var = ($type)SvPV_nolen($arg)
is discarded, and a hard-coded entry is used instead:
($type)SvPV($arg, STRLEN_length_of_$var);
(with the fields being populated directly rather than via the standard
typemap template expansion route).
This goes horribly wrong if the type of s doesn't map to T_PV. Before
this commit, the parser just silently used the standard template. This
meant that STRLEN_length_of_s didn't get initialised, and SEGVs ensued.
It also didn't work well if the XS code tried to override the standard
T_PV INPUT template.
Following this commit, the parser doesn't care what T_FOO the string
variable's type maps to; instead it just tries to modify the current
typemap template to be suitable for setting the string length too. The
new rules are:
* If the template already contains 'STRLEN_length_of_$var', use it
unmodified; the assumption is that some XS author has been playing fast
and loose with the implementation and knows what they are doing.
* If the template looks like
... SvPV..._nolen...($arg) ...
then modify it to the following (i.e. strip out the _nolen and add an
arg):
... SvPV......($arg, STRLEN_length_of_$var) ...
and allow the normal template processing and expansion to proceed.
I.e. modify anything which looks like an SvPV_nolen() variant,
including SvPVutf8_nolen(), SvPV_nolen_const() etc.
* Otherwise die, with a long hint message explaining why the template
couldn't be modified.
The original issue, with a rejected fix and a discussion which
ultimately led to this commit, can be found in PR #23479.
In something like
int
foo(char *s, int length(s))
the XS parser has been sort of assuming that the type of s, e.g.
'char *', always maps to T_PV.
If this is the case, then the typemap entry which would normally be
used, i.e.
$var = ($type)SvPV_nolen($arg)
is discarded, and a hard-coded entry is used instead:
($type)SvPV($arg, STRLEN_length_of_$var);
(with the fields being populated directly rather than via the standard
typemap template expansion route).
This goes horribly wrong if the type of s doesn't map to T_PV. Before
this commit, the parser just silently used the standard template. This
meant that STRLEN_length_of_s didn't get initialised, and SEGVs ensued.
It also didn't work well if the XS code tried to override the standard
T_PV INPUT template.
Following this commit, the parser doesn't care what T_FOO the string
variable's type maps to; instead it just tries to modify the current
typemap template to be suitable for setting the string length too. The
new rules are:
* If the template already contains 'STRLEN_length_of_$var', use it
unmodified; the assumption is that some XS author has been playing fast
and loose with the implementation and knows what they are doing.
* If the template looks like
... SvPV..._nolen...($arg) ...
then modify it to the following (i.e. strip out the _nolen and add an
arg):
... SvPV......($arg, STRLEN_length_of_$var) ...
and allow the normal template processing and expansion to proceed.
I.e. modify anything which looks like an SvPV_nolen() variant,
including SvPVutf8_nolen(), SvPV_nolen_const() etc.
* Otherwise die, with a long hint message explaining why the template
couldn't be modified.
The original issue, with a rejected fix and a discussion which
ultimately led to this commit, can be found in PR #23479.
|
Note that my PR #24062 (currently awaiting review and merging) contains an implementation of my last suggestion above from 24 Sep 2025 to fix this issue. |
Previously, the length operator on typemaps other than T_PV would lead to that length value not being initialized, leading to segfaults and worse. Worse yet, ExtUtils::ParseXS would silently emit this erroneous code.
For now it will at least give a clear error, in the future we should perhaps consider eliminating this limitation altogether.