Replies: 2 comments
-
Duplicate of https://github.com/orgs/remarkjs/discussions/1067, https://github.com/orgs/remarkjs/discussions/1194, and micromark/micromark#59
AST is short for Abstract Syntax Tree, the Abstract part focuses on simplifying working on structure, intentionally glossing over or normalizing stylistic parts of the language, like spaces or which list marker is used which don't change how the document will be structured/displayed (https://en.wikipedia.org/wiki/Abstract_syntax_tree) What you are describing is a Concrete Syntax Tree (https://en.wikipedia.org/wiki/Parse_tree), which could be built on top of micromark (syntax-tree/mdast#36 (comment)), but would have a completely different structure and way of working from remark or rehype. |
Beta Was this translation helpful? Give feedback.
-
You could do things with the positional info. It contains positional info about where in the original file things happened. You can then change only things at particular positions, yourself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Intro
I'd like to use remark (or the underlying
mdast-util-from-markdown
andmdast-util-to-markdown
) to selectively edit thousands of Markdown files while keeping the bulk of the original Markdown sources unmodified. In other words, my code should only modify the parts of the Markdown that I actually want to change (e.g. the contents of link nodes) while keeping the rest of the text as-is.In my experiments (see below for an example), I found that
mdast-util-to-markdown
"normalizes" the Markdown it outputs, e.g. it will use a consistent style for bullets, emphasis, strong, thematic breaks, etc. I know I can configure which of these styles to use via options, but that still wouldn't do what I want. I'd like to keep the bullet/emphasis/strong etc. styles in the original Markdown sources, even if that means I have to live with inconsistencies etc. I understand that normalizing the output is often a good idea, but in my case I feel that it actually makes my life harder, if only because it creates noisy diffs that make it hard to distinguish cosmetic changes from the material edits I do want to make (and which I want to check by looking through the diffs) Again, we're talking thousands of files, so no noise really matters.Example
In the following example (written in TypeScript for Deno), my input is a Markdown source text with several incongruities and inconsistencies (multiple strong and emphasis styles, a somewhat unusual format for thematic breaks etc.):
I want my code to selectively edit the
"link"
nodes (by changing "http" to "https") and then write the modified tree back as Markdown. (My actual problem uses different AST transformations, e.g. adding a field to frontmatter. But the argument is the same.)Actual vs. desired result
This prints (actual result):
I would like it to print (desired result):
I.e. preserving all idiosyncracies in the original source except for the link nodes I edited; the only change compared to the input is "http" to "https". As you can see, the actual output is normalized Markdown, e.g. it uses a consistent style for bullets/strong/emphasis etc. Which is not what I want.
Possible solutions
One idea I had was to install custom
toMarkdown
handlers for all nodes that I don't want to change. These handlers would then take the node'sposition
information to copy the text directly from the input text into the output. For example, if we want to preserve the original text of"thematicBreak"
nodes in the output, we can change the code:Now, the original format of the thematic break is preserved in the rendered output. Great!
This approach seems relatively trivial (if a little tedious to write) for leaf nodes of the tree. But I don't know how to achieve the same for nodes that can have children. E.g. in my example, the
"listItem"
nodes should generally render all their children (and themselves) as-is, unless one of their children is a"link"
node.I feel this approach could potentially work, but I don't understand the mdast APIs well enough to make it work. Any ideas?
Or are there other solutions you can think of?
Thanks for reading, and thank you to everybody who works on these packages. I love the unified/remark/micromark ecosystem!
Beta Was this translation helpful? Give feedback.
All reactions