Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Title and Excerpt line breaks #46

Closed
Truncated opened this issue Jun 8, 2024 · 8 comments
Closed

Title and Excerpt line breaks #46

Truncated opened this issue Jun 8, 2024 · 8 comments
Labels
parsing issue Problems related to Readability or Obsidian's Turndown implementation

Comments

@Truncated
Copy link

I'm pulling some ReadMe files from GitHub and the Excerpt / Titles seem to be saved in the YAML with line breaks and extra spaces that don't show in the Properties view, but is problematic when I try to rearrange property order using Linter.

Is this part of the known bug experiences?

@inhumantsar
Copy link
Owner

could you post a link to the readme and a copy of the resulting output?

@inhumantsar inhumantsar added the parsing issue Problems related to Readability or Obsidian's Turndown implementation label Jun 8, 2024
@Truncated
Copy link
Author

Of course! This one may also be a twofer in weirdness; when targeting the Readme file in GitHub repositories, I would sometimes get the content but sometimes I would get the GitHub wrapper instead. I'll try to structure some proper repeatable tests because my quick and dirty approach was to just retry "until it worked".

@Truncated
Copy link
Author

Example
excerpt and title have this issue if it's longer than a single line. Whether the entry is contained in quotes or not does not appear to have an impact.
Target: https://github.com/mouse0270/module-credits/blob/master/README.md
Frontmatter in Source Mode:

---
Source: https://github.com/mouse0270/module-credits/blob/master/README.md
site: GitHub
excerpt: Lists the authors of projects on the Manage Modules Window. If a url is
  provided in the module.json file, it will make the version tag link to the
  module url. - mouse0270/module-credits
twitter: https://twitter.com/@github
slurped: 2024-06-08T14:31:41.227Z
title: module-credits/README.md at master · mouse0270/module-credits
---

Properties view wrapping does not indicate that the extra spaces or line breaks are in excerpt:
image
image

Recent Log


##### 1717851529468 | DEBUG | onValidate called, no changes detected
- Caller: `eval (plugin:settings-search:131:29)`

{
"hash": 2735659794
}


##### 1717853123021 | DEBUG | onValidate called, no changes detected
- Caller: `HTMLDivElement.<anonymous> (app://obsidian.md/app.js:1:2989173)`

{
"hash": 2735659794
}


##### 1717853230582 | DEBUG | onValidate called, no changes detected
- Caller: `HTMLSpanElement.<anonymous> (app://obsidian.md/app.js:1:2058099)`

{
"hash": 2735659794
}


##### 1717857050965 | DEBUG | onValidate called, no changes detected
- Caller: `HTMLDivElement.<anonymous> (app://obsidian.md/app.js:1:2989173)`

{
"hash": 2735659794
}


##### 1717857101216 | DEBUG | attempting to parse prop metadata
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

{
"enabled": true,
"custom": false,
"_key": "Source",
"_idx": 0,
"id": "link",
"metaFields": [
"url",
"og:url",
"parsely-link",
"twitter:url"
],
"defaultIdx": 0,
"defaultKey": "link",
"description": "Page URL provided or a permalink discovered in metadata."
}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"url"
"meta[name="url"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"og:url"
"meta[name="og:url"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"parsely-link"
"meta[name="parsely-link"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"twitter:url"
"meta[name="twitter:url"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | attempting to parse prop metadata
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

{
"enabled": true,
"custom": false,
"_key": "byline",
"_idx": 1,
"id": "byline",
"metaFields": [
"author",
"article:author",
"parsely-author",
"cXenseParse:author"
],
"defaultIdx": 1,
"defaultKey": "byline",
"description": "Name of the primary author or the first author detected."
}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"author"
"meta[name="author"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"article:author"
"meta[name="article:author"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"parsely-author"
"meta[name="parsely-author"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"cXenseParse:author"
"meta[name="cXenseParse:author"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | attempting to parse prop metadata
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

{
"enabled": true,
"custom": false,
"_key": "site",
"_idx": 2,
"id": "siteName",
"metaFields": [
"og:site_name",
"page.content.source",
"application-name",
"apple-mobile-web-app-title",
"twitter:site"
],
"defaultIdx": 2,
"defaultKey": "site",
"description": "Website or publication name."
}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"og:site_name"
"meta[name="og:site_name"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"page.content.source"
"meta[name="page.content.source"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"application-name"
"meta[name="application-name"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"apple-mobile-web-app-title"
"meta[name="apple-mobile-web-app-title"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"twitter:site"
"meta[name="twitter:site"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{
"0": {}
}


##### 1717857101216 | DEBUG | adding metadata
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

{
"prop": {
"enabled": true,
"custom": false,
"_key": "site",
"_idx": 2,
"id": "siteName",
"metaFields": [
"og:site_name",
"page.content.source",
"application-name",
"apple-mobile-web-app-title",
"twitter:site"
],
"defaultIdx": 2,
"defaultKey": "site",
"description": "Website or publication name."
},
"elements": {
"0": {}
},
"metaFields": {},
"querySelector": "meta[name="twitter:site"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
}


##### 1717857101216 | DEBUG | attempting to parse prop metadata
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

{
"enabled": true,
"custom": false,
"_key": "date",
"_idx": 3,
"_format": "d|YYYY-MM-DDTHH:mm",
"id": "publishedTime",
"metaFields": [
"article:published_time",
"parsely-pub-date",
"datePublished",
"article.published"
],
"defaultIdx": 3,
"defaultKey": "date",
"description": "Date/time that the page was initially published.",
"defaultFormat": "d|YYYY-MM-DDTHH:mm"
}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"article:published_time"
"meta[name="article:published_time"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"parsely-pub-date"
"meta[name="parsely-pub-date"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


##### 1717857101216 | DEBUG | found prop elements
- Caller: `SlurpPlugin.slurp (plugin:slurp:12508:30)`

"datePublished"
"meta[name="datePublished"], meta[property="{s}"], meta[itemprop="{s}"], meta[http-equiv="{s}"]"
{}


@Truncated
Copy link
Author

Truncated commented Jun 8, 2024

When I target:
https://github.com/reyzor1991/foundry-vtt-pf2e-notification/blob/master/README.md
https://github.com/reyzor1991/foundry-vtt-pf2e-reaction/blob/main/README.md
I get github wrapper to the page, which I don't want.

when I target:
https://github.com/reonZ/pf2e-perception/blob/master/README.md
I get the readme content expected, which I do want.

Any github README seems to randomly have this issue, and sometimes just re-trying works. Whenever it "works" however, the line breaks / extra spaces issue exists if the content is more than one line long for excerpt and title.

@inhumantsar
Copy link
Owner

the github wrapper thing is a bit weird. all three of those links work for me. i'll try it a couple more times to see if i can narrow it down but tbh, i think rather than troubleshooting this it would make more sense to handle this as a preprocessing hook (see #37). ie: if github link and ends with .md, use the raw md file content rather than the parsed output.

regarding the newlines, it looks like the yaml lib i'm using is adding those newlines without adding the yaml syntax for a multiline string. can you try replacing the excerpt property directly in source mode with the following and see if that fixes your issue?

excerpt: >
  Lists the authors of projects on the Manage Modules Window. If a url is
  provided in the module.json file, it will make the version tag link to the
  module url. - mouse0270/module-credits

@Truncated
Copy link
Author

Truncated commented Jun 8, 2024

That doesn't help; any line breaks are parsed by other apps as separate frontmatter values.
My understanding is that unless it's a list, each frontmatter property needs to be on a single line as line breaks are used to parse between them, especially if someone is using RegEx.

That said, it's a loose standard at best. I do think this is the least impactful / most compatible end result to have, so that's what I'm presenting, but I'm also going to poke around to see if I can reference more definative guidance on this point.

Another tangent is whether maintaining formatting of the excerpt is more important/useful than it possibly clashing with Obsidian Add-Ons. I can see reasons for and against that - it's not obvious to me which answer is the best one at this moment.

@inhumantsar
Copy link
Owner

inhumantsar commented Jun 8, 2024

That said, it's a loose standard at best.

I disagree completely. Multi-line strings are well-supported by YAML in multiple ways. Obsidian specifically states that note properties are YAML-formatted and they have nothing to say on multi-line strings.

That doesn't help; any line breaks are parsed by other apps as separate frontmatter values. My understanding is that unless it's a list, each frontmatter property needs to be on a single line as line breaks are used to parse between them, especially if someone is using RegEx.

Another tangent is whether maintaining formatting of the excerpt is more important/useful than it possibly clashing with Obsidian Add-Ons. I can see reasons for and against that - it's not obvious to me which answer is the best one at this moment.

Lots of websites publish multiple paragraphs in their excerpt (eg: academic journals that put a paper's abstract into the excerpt metadata), so it's important for slurp to ensure that the YAML lib respects existing line breaks.

In this case, the YAML lib is adding line breaks and that might be a behaviour I can change.

Beyond that, Slurp's goal is to capture information, so it's important for Slurp to stay as true to the original content as possible.

If other plugins aren't handling well-formatted YAML when processing note properties, then I'd recommend requesting proper YAML support from them. YAML is a complex specification, relying solely on regex to parse it will be a lot of work to maintain.

@Truncated
Copy link
Author

That said, it's a loose standard at best.

I disagree completely. Multi-line strings are well-supported by YAML in multiple ways. Obsidian specifically states that note properties are YAML-formatted and they have nothing to say on multi-line strings.

I was probably unfair in the negative. I was referring to things like this: https://en.wikipedia.org/wiki/YAML#Criticism where StrictYAML and similar endevours came in response to.

All that said, I'm happy to agree with you and obviously Obsidian targets YAML as a standard.

That doesn't help; any line breaks are parsed by other apps as separate frontmatter values. My understanding is that unless it's a list, each frontmatter property needs to be on a single line as line breaks are used to parse between them, especially if someone is using RegEx.

Another tangent is whether maintaining formatting of the excerpt is more important/useful than it possibly clashing with Obsidian Add-Ons. I can see reasons for and against that - it's not obvious to me which answer is the best one at this moment.

Lots of websites publish multiple paragraphs in their excerpt (eg: academic journals that put a paper's abstract into the excerpt metadata), so it's important for slurp to ensure that the YAML lib respects existing line breaks.

In this case, the YAML lib is adding line breaks and that might be a behaviour I can change.

Beyond that, Slurp's goal is to capture information, so it's important for Slurp to stay as true to the original content as possible.

If other plugins aren't handling well-formatted YAML when processing note properties, then I'd recommend requesting proper YAML support from them. YAML is a complex specification, relying solely on regex to parse it will be a lot of work to maintain.

When this ticket started, I was not clear on exactly what "correct" was supposed to look like in the first place, so much of this was thinking out loud to some degree. I agree with you completely now. I went digging for some clarity over discord and the web and essentially ended where you are.

At this point, I've tested in multiple YAML validators to ensure what "correct" is supposed to be and what you output is supported by that, so the behavior issue is clearly with Linter in this case.

Thank you for working through that with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parsing issue Problems related to Readability or Obsidian's Turndown implementation
Projects
None yet
Development

No branches or pull requests

2 participants