Better Footnote title attributes? #387

edent · 2022-11-15T06:53:23Z

Footnote links' titles can only be set once by the library. This change would make the attribute display the content of the footnote.

How

At the moment, if a user hovers over a footnote, the title attribute displays a tool tip.

For example, if

php-markdown/Michelf/MarkdownExtra.php

Line 28 in eb176f1

public string $fn_link_title = "";

is set to "Read the footnote." then all footnotes have this pop-up:

I think it would be nice if it showed users the full text of the footnote. For example:

This is controlled by :

php-markdown/Michelf/MarkdownExtra.php

Line 1764 in eb176f1

$title = $this->fn_link_title;

The change is relatively straightforward:

if ($this->fn_link_title != "") {
	$title = trim( strip_tags( $this->footnotes[$node_id] ) );
	$title = $this->encodeAttribute( $title );
	$attr .= " title=\"$title\"";
}

That takes the text of the footnote, sanitises it, and adds it as the title element.

Would you be interested in a PR for this?

(I have also raised this with WordPress's Jetpack which is running an older version of your library - Automattic/jetpack#27387)

The text was updated successfully, but these errors were encountered:

michelf · 2022-11-15T12:09:12Z

How do you propose to handle footnotes like this one¹.

Footnotes are often one line of text, but this line of text can contain formatting such as code, emphasis, links, and images . Actually, it wouldn't surprise me to find that links are pretty common since footnotes are often used for references.

They can also go quite long with paragraphs and other block-level elements such as:

tables tables

rows rows

blockquotes
```
code blocks
```
- lists
- and more lists
So we need a strategy to deal with this. I don't think strip_tags is good enough. ↩

edent · 2022-11-15T14:34:11Z

A very reasonable question!

To me, it comes down to user needs. I want a tool-tip so I can see whether the footnote says "(Smith, 1996)" or "A funny story about that, back in 1996 I saw ...".

That way, I can decide if I want to visit the footnote.

Links are handled well, I think. echo strip_tags('<a href="https://example.com/">(Smith, 1996)</a>'); spits out (Smith, 1996)

The HTML spec for title doesn't define a maximum length but, anecdotally, some browsers will truncate them.

So I would use strip_tags and then mb_substr() to knock it down to, say, a max of 100±10 characters.

What do you think to that idea?

michelf · 2022-11-15T18:06:28Z

I think truncating it if too long makes this idea reasonable. It should probably truncate at the first end of paragraph </p> and some other tags too (<table>, <blockquote>…). It'd be nice to replace images with the alt text. And spacing is interpreted differently inside title="" than in HTML, so it should probably be somehow normalized (collapsing double-spaces and newlines, then replacing <br/> with actual newlines).

Sorry if this seems a lot of work. Maybe it's not all needed.

I'm not sure if this tooltip should be the default, but at least it should be possible to enable or disable it.

edent · 2022-11-16T21:22:23Z

Things like alt text will be tricky without either parsing the DOM or using a library like https://github.com/mtibben/html2text

Similarly, stopping at specific tag or replacing <br> with \n might involve a fair bit of parsing.

So, I guess the options are:

Cheat. Strip the tags, Grab the first N characters, and give users a somewhat meaningful preview.
Fully parse the DOM of the generated HTML to grab the appropriate text to a suitable boundary.
Generate a bespoke title from the Markdown directly.

I'm lazy, so happy to contribute a patch for (1). But if you're looking for something more comprehensive, I'll have to bow out.

michelf · 2022-11-16T22:00:39Z

Personally, I think we could cheat a bit more. Something like this:

list($title, $tail) = preg_split('{</p\s*>\s*}i', $title, 2); // split at </p>
$truncated = $tail != '';
$title = preg_replace('{<img\b[^<]\balt\s*=\s*(?:"([^"]*")|(\'([^\']*\')|(\S*)\b)[^<]*>}i', '\1\2\3', $title); // img -> alt text
$title = preg_replace('{\s+}', ' ', $title); // normalize whitespace
$title = trim($title); // remove leading & trailing whitespace
$title = preg_replace('{<br\s*/?>}i', '\n', $title); // br -> \n
$title = strip_tags($title);
$before_truncation = $title;
$title = preg_replace('{^.{0,100}+?\b}', '', $title); // cut after 100 characters, but continue for last word.
if ($truncated || $before_truncation != $title) {
   $title .= '…'; // add ellipsis after truncation
}

There's some situations where the img will be parsed incorrectly (if you include unescaped > in quoted attributes (other than the alt itself). But otherwise it should work... in theory. I didn't test it at all, feel free if you want to.

edent · 2023-01-03T21:31:40Z

Here's a slightly less regex-y way of doing things using PHP's DOMDocument

$doc = new DOMDocument();
$doc->loadHTML( mb_convert_encoding("<p>fugiat <em>consequuntur</em>. 蘵 <h2>Dolores deleniti</h2> <a href='/home'>quia voluptas</a> unde dolor <img alt='esse perspiciatis voluptas' src='img.jpg' />. <p>Autqui 🇬🇧<strong> corporisrerumfuga</strong> quo a. Fuga voluptatem rerum autem. Beatae dolorem et porro ut culpa. Est ut deserunt in quod non omnis suscipit veritatis.</p>", 'HTML', 'UTF-8' ));

$xp = new DOMXPath( $doc );

//	Replace each element with the text inside it
foreach ( $xp->query('//*/text()') as $node ) {
	$p_text = $doc->createTextNode( $node->wholeText . "\n");
	$node->parentNode->replaceChild( $p_text, $node );
}

//	Replace each <img> with its alt text
foreach ( $xp->query('//img') as $node ) {
	$alt_text = $doc->createTextNode($node->getAttribute('alt') . "\n");
	$node->parentNode->replaceChild($alt_text, $node);
}

//	Replace each <br> with a newline
foreach ( $xp->query('//br') as $node ) {
	$newline = $doc->createTextNode( "\n" );
	$node->parentNode->replaceChild( $newline, $node );
}

//	Get a plaintext representation
$title_text = html_entity_decode( strip_tags( $doc->saveHTML() ) );

//	Trim any whitespace
$title_text = trim( $title_text );

The only problem is splitting the text. Not all multibyte languages (like Chinese) have word boundaries like \b. I'm still having a think about how to solve that.

michelf · 2023-01-04T04:01:33Z

I suppose a solution could be to have two limits: 100 characters + characters until word break, but if no word break is found between the 100th and 120th character you simply cut at 100 (or any other arbitrary length).

// cut after 100 characters, but continue for last word (if less than 20 characters).
$title = preg_replace('{^.{0,100}({1,20)?\b|)}u', '', $title);

I think the u pcre flag will make it consider UTF-8 correctly (not cut in the middle of a code point). It would still be susceptible of breaking things in the middle of grapheme clusters (like for multi-code-point emojis, Korean syllables, combining characters, etc.). Fixing this is going to require something more much more complex, like inspecting character classes I suppose. Cutting only at word breaks keeps things much simpler in comparison.

edent · 2023-01-04T21:04:58Z

I couldn't get that regex to work - and I'm always a little paranoid about how readable and maintainable they are.

This should accomplish the same thing:

//	Split by space
$parts = explode( " ", $title_text );

//	Add each part to a new string until it is 100 characters long
$title = "";

foreach ( $parts as $part) {
	//	Always add the first part
	if ( mb_strlen( $title ) == 0 ) {
		$title .= $part . " ";
		//	If the first part is a very long string, reduce it to 100 characters
		if ( mb_strlen( $title ) > 100 ) {
			$title = mb_substr( $title, 0, 100 );
			break;
		}
	} else if ( ( mb_strlen( $title ) + mb_strlen( $part ) ) < 100 ) {
		//	Add the next whole word which doesn't take the total length over 100 characters
		$title .= $part . " ";
	} else {
		break;
	}
}

//	Trim any whitespace
$title = trim( $title );

//	If it has been truncated, add an ellipsis
if ( mb_strlen( $title ) < ( mb_strlen( $title_text ) ) ) {
	$title .= "…";
}

I've tested this with several long strings and it seems to work. I've also made some encoding changes to the above function so it copes better with multibyte characters.

Let me know if you'd like this as a proper PR.

edent · 2023-03-21T21:07:38Z

One last bug :-)

$this->footnotes[$node_id] returns hashed values for some elements, as defined in:

php-markdown/Michelf/MarkdownExtra.php

Line 408 in eb176f1

protected function _hashHTMLBlocks_inMarkdown($text, $indent = 0,

So, for example, a footnote which is a table is returned as B2B

Is there any way to unhash $this->footnotes[$node_id]?

michelf · 2023-03-21T21:19:59Z

$this->unhash($text) (inherited from the Markdown class)

edent · 2023-03-26T09:09:42Z

I ended up using formParagraphs() so that markdown inside a title was properly decoded first.

jeherve mentioned this issue Dec 7, 2022

Dynamic title attribute for markdown footnotes Automattic/jetpack#27388

Open

2 tasks

edent mentioned this issue Mar 26, 2023

Titles for footnotes #390

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better Footnote title attributes? #387

Better Footnote title attributes? #387

edent commented Nov 15, 2022

michelf commented Nov 15, 2022

edent commented Nov 15, 2022

michelf commented Nov 15, 2022

edent commented Nov 16, 2022

michelf commented Nov 16, 2022 •

edited

Loading

edent commented Jan 3, 2023 •

edited

Loading

michelf commented Jan 4, 2023

edent commented Jan 4, 2023

edent commented Mar 21, 2023

michelf commented Mar 21, 2023

edent commented Mar 26, 2023

Better Footnote title attributes? #387

Better Footnote title attributes? #387

Comments

edent commented Nov 15, 2022

How

michelf commented Nov 15, 2022

Footnotes

edent commented Nov 15, 2022

michelf commented Nov 15, 2022

edent commented Nov 16, 2022

michelf commented Nov 16, 2022 • edited Loading

edent commented Jan 3, 2023 • edited Loading

michelf commented Jan 4, 2023

edent commented Jan 4, 2023

edent commented Mar 21, 2023

michelf commented Mar 21, 2023

edent commented Mar 26, 2023

michelf commented Nov 16, 2022 •

edited

Loading

edent commented Jan 3, 2023 •

edited

Loading