Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fenced code block as a list item #9865

Open
jsx97 opened this issue Jun 10, 2024 · 5 comments
Open

Fenced code block as a list item #9865

jsx97 opened this issue Jun 10, 2024 · 5 comments
Labels

Comments

@jsx97
Copy link

jsx97 commented Jun 10, 2024

A bug report or maybe a request for improvement.

Sometimes it is necessary to have a fenced code block as a list item. As I have discovered, the proper syntax for this is not very intuitive.

pandoc input.md -o output.htm
- example 1
- ```
  list item two
  list item two
  ```
- list item three

In Example 1, each line inside the pre block is indented with two spaces, whereas I expected the lines won't be indented.

-   example 2
-   ```
    list item two

    list item two
    ```
-   list item three

Example 2 works fine, but ony if the two lines inside the pre block are separated with the empty one. If there is no empty line between them, the markup will be <li><code>list item two list item two</code></li>.

- example 3
- ```
list item two
list item two
```
- list item three

Example 3 demonstrates the syntax that works fine. Though we can use it, I would prefer the syntax from the Example 1.

@jsx97 jsx97 added the bug label Jun 10, 2024
@jgm
Copy link
Owner

jgm commented Jun 10, 2024

Hm. I can reproduce this. It's definitely not intended, and you won't get that behavior with -f commonmark or -f gfm.

% pandoc -t native
- example 1
- ```
  list item two
  list item two
  ```
- list item three

[ BulletList
    [ [ Plain [ Str "example" , Space , Str "1" ] ]
    , [ CodeBlock
          ( "" , [] , [] ) "  list item two\n  list item two"
      ]
    , [ Plain
          [ Str "list" , Space , Str "item" , Space , Str "three" ]
      ]
    ]
]

A bug I would say.

@jgm
Copy link
Owner

jgm commented Jun 11, 2024

Even more minimal case:

% pandoc
- ```
  item
  ```
^D
<ul>
<li><pre><code>  item</code></pre></li>
</ul>

@jgm
Copy link
Owner

jgm commented Jun 11, 2024

The problem lies with

listLineCommon :: PandocMonad m => MarkdownParser m Text
listLineCommon = T.concat <$> manyTill
              (  many1Char (satisfy $ \c -> c `notElem` ['\n', '<', '`'])
             <|> fmap snd (withRaw code)
             <|> fmap (renderTags . (:[]) . fst) (htmlTag isCommentTag)
             <|> countChar 1 anyChar
              ) newline

Originally this function was just grabbing the first literal text line of the list item (whose raw contents would be reparsed later). But special handling was added for inline code and HTML comment tags (likely for good reasons which we can look up). Note that ``` can delimit inline code as well as code blocks. So, this function is gobbling the whole thing, instead of just the first line. And because of this, the code that would have removed the extra indentation doesn't get triggered (that's in listLine).

Code is a bit of a mess here -- I need to revisit some things, but I'm recording this diagnosis here for when I have a chance to do that.

Ref #5628

@jgm
Copy link
Owner

jgm commented Jun 11, 2024

The story begins 15 years ago, with this commit: eb2e560

That was meant to deal with cases like the following:

- a <!--

- b

-->
- c

That is still a case pandoc handles nicely (whereas commonmark doesn't recognize the HTML comment in this kind of context).

But the cost of dealing with this case was that, in consuming raw content for the list item, we needed to gobble material inside HTML comments. Fine! For many years we did that. But then someone came up with a case like

- a `<!--`
- b `-->`

in which the special characters are quoted in inline code. Well, clearly our "raw line" parser needs to gobble up inline code sections, too. And that's all fine until we have a case like yours. Note that

```
abc
```

would be perfectly valid inline code (were it not parsed first as a code block). So the raw list item parser gobbles up this whole chunk, avoiding the line-by-line reading that strips leading indentation.

What a mess!

In this case we could add an additional band-aid to the current pile of band-aids, probably. But I'm tempted to think that this was all a mistake, and that the way to sanity is the approach we took with commonmark, which just makes it very clear that indicators of block structure take precedence over inline parsing, and render the first example above as

<ul>
<li>
<p>a &lt;!--</p>
</li>
<li>
<p>b</p>
</li>
</ul>
<p>--&gt;</p>
<ul>
<li>c</li>
</ul>

So, I'm tempted to take out all the special-purpose code instead of adding something else that will probably break in some new way in the future...

@jgm
Copy link
Owner

jgm commented Jun 11, 2024

See also #7778 for another related case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants