Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove multi-line block/content #17

Open
vrody opened this issue Apr 27, 2015 · 4 comments
Open

remove multi-line block/content #17

vrody opened this issue Apr 27, 2015 · 4 comments

Comments

@vrody
Copy link

vrody commented Apr 27, 2015

Hi,
can be use subs_filter, remove a block that contains specified tex?

any content
any content any content, "Hello world", other any content

eg remove the block DIV that contains tex "Hello world"?

if the unit is in a line that is easy to remove,

any content, Hellow World, other any content
subs_filter '
(.*)Hellow World(.*)
' '' r;

but if more lines, and there are gaps in them, then I can not remove the entire unit
help please

@siochs
Copy link

siochs commented Sep 15, 2015

I have got the exact same problem. Tried so far <form.+<\/form>, (*CRLF)<form.*<\/form>, <form[\s\S]+<\/form>, <form(\n|\r|\r\n|\R|\v|\s|\pZs|.)+\/form>, (?s)<form.*?<\/form> , ...
Just nothing matches.
Do you have any ideas?
Thanks!

@kevinquinnyo
Copy link

I just discovered this issue as well. I think that it works on a line-by-line basis much like apache's mod_substitute. I don't know much about the internals of nginx, but I'm guessing it's definition of "line" in a response body of html payload is similar to a "line" in unix, in that it's terminated by a '\0' null byte or something similar.

I think the real answer is, if you're needing something as complicated as parsing html via complex regex (always a bad idea to be honest), or multi-line substitutions, it should probably be handled upstream (in the application), unfortunately.

Correct me if I'm wrong on this @yaoweibin

@jochenwezel
Copy link

would be great if a multi-line expression could be supported,
e.g.

subs_filter 'content in firstline.*content in a following line' 'replacement' rm

where "rm" could also be another value like "m" to indicate that regular expression with multi-line support should be used instead of standard regular expression engine configured for line-by-line

@simeonackermann
Copy link

I successfully removed a html tag with any content with:

subs_filter '<div(.|\n)*</div>' '' rg;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants