Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multi-line sub attributes #189

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bigpresh
Copy link

@bigpresh bigpresh commented Dec 7, 2023

For use with Catalyst::Plugin::CheckFileUploadTypes, I needed to provide a fairly long list of acceptable MIME types. This means that my handler code would be e.g.:

sub index_POST: ExpectUploads(image/png image/jpeg application/pdf) {
    ...
}

... which is fine, but the list of types to support grew longer and
longer, not helped by some very long MIME types such as
application/vnd.openxmlformats-officedocument.wordprocessingml.document

So, I wanted to make it much more readable, for e.g.:

sub index_POST: ExpectUploads(
    image/jpeg  image/png  image/bmp
    application/pdf
    application/vnd.openxmlformats-officedocument.wordprocessingml.document
    application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
) {
    ...
}

That looks like it should be fine, but failed, because the code inCatalyst::Controller::_parse_attrs() which parse subroutine attributes expected it to be all on line line.

This change makes it work correctly for me, both for single-line attributes with and without a value and for multi-lined ones as per the example above too - and makes the parsing code a little more readable too, I think.

For use with Catalyst::Plugin::CheckFileUploadTypes, I needed to provide
a fairly long list of acceptable MIME types.  This means that my handler
code would be e.g.:

```perl
  sub index_POST: ExpectUploads(image/png image/jpeg application/pdf) {
      ...
  }
```

... which is fine, but the list of types to support grew longer and
longer, not helped by some very long MIME types such as
`application/vnd.openxmlformats-officedocument.wordprocessingml.document`

So, I wanted to make it much more readable, for e.g.:

```perl
  sub index_POST: ExpectUploads(
      image/jpeg  image/png  image/bmp
      application/pdf
      application/vnd.openxmlformats-officedocument.wordprocessingml.document
      application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  ) {
      ...
  }
```

That looks like it should be fine, but failed, because the code in
`Catalyst::Controller::_parse_attrs()` which parse subroutine attributes
expected it to be all on line line.

This change makes it work correctly for me, both for single-line
attributes with and without a value and for multi-lined ones as per the
example above too - and makes the parsing code a little more readable
too, I think.
@bigpresh bigpresh force-pushed the bigpresh/handle_multiline_attributes branch from f97fe44 to d5e6d99 Compare December 7, 2023 11:33
@jjn1056
Copy link
Member

jjn1056 commented Dec 7, 2023

I love the idea but I worry a bit about what this might break in the wild since it's a pretty big change to the Regexp. At the very least we should I think have an application flag that enables the old behavior. And I'd love to see some test cases. Generally I'd prefer to merge code with at least one test case.

@bigpresh
Copy link
Author

bigpresh commented Dec 7, 2023

I've tested carefully on a large, real-world app with a variety of attributes in various places and not found any problems, but yeah, I'll add a proper test for it. I'm not sure a flag to enable the old behaviour is particularly useful, as it still parses just the same for normal one-line attributes (both with & without values, e.g. Local or Path('/foo') etc).

@bigpresh
Copy link
Author

bigpresh commented Dec 7, 2023

I've just pushed the addition of a test which proves that the parsing of multi-line attributes works, and that parsing normal ones still works as expected too. Also added stripping of the indentation from values, so that if you say e.g.:

sub index : MultiLineAttr(
    one
    two
    three
) {
    ...
}

then the resulting value is one\ntwo\nthree, not one\n two\n three.

(It's up to the calling code to decide what to do with an attribute like that - they will probably want to split it on \n and maybe whitespace to get an array of values, but I don't think Catalyst should decide that for them.)

@perl-catalyst-sync
Copy link

perl-catalyst-sync commented Dec 9, 2023 via email

@bigpresh
Copy link
Author

I'm going to need to ponder this a bit, especially how the whitespace
collapse works and what it looks like for people that are using quotes (I
see a lot of code in the wild that does like :Attr("foo").

That should be fine, as it only strips leading whitespace from the start of each line of the attribute's value, so should be unchanged. I have tested this change in a large codebase with a range of different attribute types.

So I'm not sure I like the whitespace collapse but agree with you that
Catalyst should not be splitting on newlines. My gut feeling is that this
should be 'preserve the value as is but just allow multi-line'. That way
it doesn't mess with any fancy attributes that contain JSON or YAML (which
is indentation sensitive) that people might be (or might in the future)
want. Thoughts?

Yeah, that sounds reasonable - I'm happy to remove the leading whitespace stripping if you're happy with the rest.

Or, one possibly sensible heuristic would be to record what whitespace was at the start of the first line of the value, and remove that from the start of every other line, so that you could safely say e.g.:

sub index : YAMLAttribute("
    some_yaml: here
    and:
        more:
         - yaml
 ") {
    ...
}

and know that the end result seen in the attribute will be the same YAML structure, just with the leading whitespace used for readability in the code removed, e.g.:

some_yaml: here
and:
    more:
      - yaml

... rather than accidentally breaking it e.g.

some_yaml:here
and:
more:
- yaml

as the current naive "strip all leading whitespace" implementation would do, which I agree is not desirable.

@bigpresh
Copy link
Author

Random thought in passing - I wonder if the whitespace-stripping could work in the same kind of way as perl 5.26's leading whitespace stripping from heredocs does (which is the same kind of logic Filter::HereDocIndent used too)...

So you'd have to specifically enable it by adding a character - e.g. the ~ that perl 5.26+ uses, i.e. <<~ instead of <<...

Maybe e.g.

sub index : MultiLineAttr(~"
    some_yaml: here
    and:
        more: here
    ") {

... where the presence of the leading ~ tells us you want leading whitespace stripped, and if it wasn't there, we don't attempt to strip anything?

Still may be more hassle & complexity than it's worth, and it may well be that "no, don't even try to strip whitespace, leave the consumer to do that, and just manage to support multi-line attribute values" is the right decision here.

@perl-catalyst-sync
Copy link

perl-catalyst-sync commented Dec 12, 2023 via email

Test the changes to allow multi-line sub attributes on handlers, and
that normal ones (with and without a value) continue to work as
expected.
@bigpresh bigpresh force-pushed the bigpresh/handle_multiline_attributes branch from b6771cd to f6257a5 Compare December 12, 2023 19:19
@bigpresh
Copy link
Author

bigpresh commented Dec 12, 2023

the more I think about it the more I think we shouldn't do any sort of auto trim.

Yeah, fair enough. It's potentially a risk of causing unpleasant surprises, and it's not hard for users who're using multi-line attributes and need them to not be indented as they were in the code to fix that themselves (e.g. sling it through Text::Outdent or similar)

I've rebased away the unindention stuff, leaving just the support for multi-line attributes and tests. All tests pass, and as I mentioned earlier I've tested it with a large real-world codebase where we make use of a variety of attributes, and all seems fine.

@perl-catalyst-sync
Copy link

perl-catalyst-sync commented Dec 20, 2023 via email

@bigpresh
Copy link
Author

No need to apologise - I know the feeling... and "move fast and break things" may work for some things, but not a well-relied-upon framework, taking time and thinking things through when time permits is absolutely the right way to go!

Many thanks and have a lovely Christmas John.

@perl-catalyst-sync
Copy link

perl-catalyst-sync commented Apr 25, 2024 via email

@bigpresh
Copy link
Author

Sorry to hear that John, Covid's a pig.

I think I addressed your feedback on this and it's at a state where it's ready for you to merge it if you're happy with it, I don't think there's anything outstanding, but do let me know if you think there's any further improvements to be made!

@perl-catalyst-sync
Copy link

perl-catalyst-sync commented Apr 26, 2024 via email

@jjn1056
Copy link
Member

jjn1056 commented May 9, 2024

@bigpresh hey one thing, I wasn't sure the impact of this change

( $value =~ s/^'(.*)'$/$1/s ) || ( $value =~ s/^"(.*)"/$1/s );

would that strip single and double quotes on each line? My regex Foo is so so. I think for historical reason we need to string the first and last of those but not any inbetween

So for example

sub test :Action :Test("
'aaa'
'bbb'
") { ... }

should be something like

$action->attributes->{Test} = " 'aaa'\n. 'bbb'\n. ";

rather than have those inner single quotes removed. Can we add a test case around what the goal of tweaking that regex is?

@bigpresh
Copy link
Author

bigpresh commented May 9, 2024

@bigpresh hey one thing, I wasn't sure the impact of this change

( $value =~ s/^'(.*)'$/$1/s ) || ( $value =~ s/^"(.*)"/$1/s );

would that strip single and double quotes on each line?

No - only at the beginning and end of the whole value. The /s regex modifier changes the behaviour of . to match any character, even a newline. Unless you also use /m, the ^ and $ will only match the beginning and end of the entire string.

For e.g.:

$ my $value;

$ $value = "' \'one\' \n \'two\' \n \'three\' '";
' 'one' 
 'two' 
 'three' '
$ ( $value =~ s/^'(.*)'$/$1/s ) || ( $value =~ s/^"(.*)"/$1/s );
1
$ $value
 'one' 
 'two' 
 'three' 

However, there is a subtle buglet present - for the double-quote case, it's missing the $ to anchor it to the end of the string, which I'll fix shortly. EDIT: looking at the diff, that was a pre-existing bug, not new here, I just added a comment explaining what the regexes are doing, and added the /s modifiers.

I think, for code readability and maintenance at least, it would be better if it was one regex using a backref to match the character it matched at the beginning, e.g. s/(["'])(.*)\1$/$2/sr; - in other words, match a ' or " at the start of the string, anything in between then the same character ' or " at the end, and replace the whole thing with what was in between - I can't say without benchmarking whether that's better or worse performance-wise than doing two separate regexes, but I doubt the difference would be too much.

Replace the two regexes with one.

This also fixes a subtle already-present bug in the double-quote
unquoting where it's missing the anchor to the end of string - I think
that's not actually bitten anyone because the `.*` capture is greedy, so
it'll capture as much as it can, so the `"` at the end will match the
last double quote, not the first one it sees.
As discussed on the PR[1], there was concern around the stripping of
quoting from multi-line attrs.

Added tests to show that when unquoting a multi-line attribute, inner
quotes are left alone.

For e.g.

```perl
sub do_foo MultiLineAttr("
    'foo',
    'bar'
") {
    ...
}
```
... doesn't have the `'foo'` / `'bar'` unquoted, but the outer quotes
go.

[1]: perl-catalyst#189 (comment)
@bigpresh
Copy link
Author

bigpresh commented May 9, 2024

I've replaced the two regexes with one, and also added another test for a quoted multi-line attribute with quotes within, showing they're not removed.

@jjn1056
Copy link
Member

jjn1056 commented May 9, 2024

@bigpresh you're awesome

I feel pretty good, I sent the PR to the catalystdev channel for comments and pinged mst on it. If I don't get any feedback I'll ship it early next week. Probably Tuesday since I have a big $work thing happening on Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants