Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should $clean_tags_re contain img? #402

Open
edent opened this issue Aug 27, 2024 · 1 comment
Open

Should $clean_tags_re contain img? #402

edent opened this issue Aug 27, 2024 · 1 comment

Comments

@edent
Copy link

edent commented Aug 27, 2024

If I have this HTML:

<img src="example.png" alt="Alt text

With

Multiple newlines." >

It is transformed into:

<p>&lt;img src="example.png" alt="Alt text</p>

<p>With</p>

<p>Multiple newlines." ></p>

Changing this line:

protected string $clean_tags_re = 'script|style|math|svg';

to

protected string $clean_tags_re = 'script|style|math|svg|img';

Fixes the issue.

I can't think of anything within an <img> element which should be altered by Markdown. Alt text can't contain HTML elements, src shouldn't be altered, it's a self-closing element so won't have any contents.

Are there any downsides to adding img to this regex?

@michelf
Copy link
Owner

michelf commented Aug 27, 2024

Note that you can add no-break spaces on those empty lines if you want to fix things without fussing with the code.

Also, we have the same problem with other tags too:

<span title="A

multiline

title">text with title</span>

I think the basic issue is that the HTML block parser ignores span-level tags. Those are parsed at a later stage in parseSpan, but that stage is after splitting in paragraphs.

I suppose changing the regex here to accept all tag names would work.

(?> # Tag name.
' . $this->block_tags_re . ' |
' . $this->context_block_tags_re . ' |
' . $this->clean_tags_re . ' |
(?!\s)'.$enclosing_tag_re . '
)

Things to watch for:

  • It's possible the following code dealing with the matched tag won't know what to do with those new tag names.
  • It's also possible parseSpan will need adjustments for dealing with those hashed tags.
  • There could be some interactions with code spans. Currently the HTML block parser attempts to ignore them by detecting code span markers, but code spans are in reality later in parseSpan. Disagreements between the two algorithms could cause some behaviors to change when it comes to tags in code spans.

Honestly, I'm not sure it's worth solving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants