Can we have filter or remove rules to filter/remove via regexp or wildcard???
E.g.:
1.
Zero width space and/or Non-breaking space:
<a href="https://bla-bla-bla">​​</a>text-text-text produce:
[](https://bla-bla-bla)text-text-text
Is there any way to filter out (remove) html with zero visual content?
Something like:
turndownService.addRule('al_spaces', {
regexFilter: '<[^<>]+?>[[:space:]]<\/[^<>]+?>',
replacement: function (content) {
return ''
}
})
List of spaces for reference:
| Number |
Character name |
| \u0020 |
space |
| \u00A0 |
no-break space |
| \u1680 |
Ogham space mark |
| \u180E |
Mongolian vowel separator |
| \u2000 |
en quad |
| \u2001 |
em quad |
| \u2002 |
en space (nut) |
| \u2003 |
em space (mutton) |
| \u2004 |
three-per-em space (thick space) |
| \u2005 |
four-per-em space (mid space) |
| \u2006 |
six-per-em space |
| \u2007 |
figure space |
| \u2008 |
punctuation space |
| \u2009 |
thin space |
| \u200A |
hair space |
| \u200B |
zero width space |
| \u202F |
narrow no-break space |
| \u205F |
medium mathematical space |
| \u3000 |
ideographic space |
| \uFEFF |
zero width no-break space |
| \uFFFC |
object replacement Character |
2.
Line break which breaks markdown's markup:
<strong>bla-bla-bla<br></strong> <br>text-text-text produce:
**bla-bla-bla
**
text-text-text
Is there any way to filter out (remove) all line breaks that precedes the closing tag?
Something like:
turndownService.removeAllBefore('<br>', '</*>')
Here is regex examples:
Remove the anchor with zero-width spaces (you can't see them until you paste it in dev console):
selectedHTML='<i>bla</i><b><a href="https://bla-bla-bla"></a>text-text-text</b><i>bla</i>'
selectedHTML.replace(/<[^<>]+?>[\u00A0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF\u0020\uFFFC]+<\/[^<>]+?>/gm, '')
Remove the line break that precedes closing tag:
selectedHTML='<i>bla</i><strong>bla-bla-bla<br></strong> <br>text-text-text<i>bla</i>'
selectedHTML.replace(/(<br ?\/?>)+(<\/[^<>]+?>)/gi, '$2')
Swap the line break that precedes closing tag and the closing tag with:
selectedHTML='<i>bla</i><strong>bla-bla-bla<br></strong> <br>text-text-text<i>bla</i>'
selectedHTML.replace(/((<br ?\/?>)+)(<\/[^<>]+?>)/gi, '$3$1')
It would be nice if regex filter will skip the content of code and pre tags.
P.S.
And also:
// Drop anchor html tags which contains only dots, commas
selectedHTML = '<a href="#">,</a>'
selectedHTML.replace(/<a [^<>]+?>[.,]+<\/a>/gim, '')
And
// Drop emoji images, keep emoji unicode (from alt attr)
selectedHTML = '<img src="img-apple-64/1f914.png" class="emoji" alt="🤔">'
selectedHTML.replace(/<img [^<>]+?alt=['"]([\p{Emoji}\u200d]+)['"][^<>]*?\/?>/gimu, '$1')
Can we have filter or remove rules to filter/remove via regexp or wildcard???
E.g.:
1.
Zero width space and/or Non-breaking space:
<a href="https://bla-bla-bla">​​</a>text-text-textproduce:Is there any way to filter out (remove) html with zero visual content?
Something like:
List of spaces for reference:
2.
Line break which breaks markdown's markup:
<strong>bla-bla-bla<br></strong> <br>text-text-textproduce:Is there any way to filter out (remove) all line breaks that precedes the closing tag?
Something like:
Here is regex examples:
Remove the anchor with zero-width spaces (you can't see them until you paste it in dev console):
Remove the line break that precedes closing tag:
Swap the line break that precedes closing tag and the closing tag with:
It would be nice if regex filter will skip the content of
codeandpretags.P.S.
And also:
And