Skip to content

Commit

Permalink
feat!: Rebuild text parsing. Remove (expr), add (sym)(str)(num)(nl)
Browse files Browse the repository at this point in the history
Previously, (expr) had anonymous "str" "num" and "sym" nodes. Those are
now exposed. (sym) nodes retain the anonymous symbols, like (sym "*").
Additionally, (sym next: "str") indicates the symbol is before an immediate
(str), and (sym prev: "num") indicates the symbol is after a number.

Add (nl) in multiline text:
  - (paragraph)
  - (fndef (description))
  - (contents), in drawers, blocks, dynamic blocks, and latex_envs

Add "sub" and "final" fields to (stars)
  • Loading branch information
milisims committed Jun 19, 2023
1 parent 64cfbc2 commit e538c2b
Show file tree
Hide file tree
Showing 13 changed files with 71,319 additions and 80,142 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ build
*.log
/examples/*/
/target/
*.so
*.o
75 changes: 34 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,6 @@ usefully parse org files to be used in any library that uses tree-sitter
parsers. It is not meant to implement emacs' orgmode parser exactly, which is
inherently more dynamic than tree-sitter easily allows.

## Overview

This section is meant to be a quick reference, not a thorough description.
Refer to the tests in `corpus` for examples.

- Top level node: `(document)`
- Document contains: `(directive)* (body)? (section)*`
- Section contains: `(headline) (plan)? (property_drawer)? (body)?`
- headline contains: `((stars), (item)?, (tag_list)?)`
- body contains: `(element)+`
- element contains: `(directive)* choose(paragraph, drawer, comment, footnote def, list, block, dynamic block, table)` or a bare `(directive)`
- paragraph contains: `(expr)+`
- expr contains: anonymous nodes for 'str', 'num', 'sym', and any ascii symbol that is not letters or numbers. (See top of grammar.js and queries for details)

Like in many regex systems, `*/+` is read as "0/1 or more", and `?` is 0 or 1.

## Example

```org
Expand Down Expand Up @@ -48,20 +32,24 @@ Parses as:
(document [0, 0] - [16, 0]
body: (body [0, 0] - [4, 0]
directive: (directive [0, 0] - [1, 0]
name: (expr [0, 2] - [0, 7])
name: (expr [0, 2] - [0, 7]
(str [0, 2] - [0, 7]))
value: (value [0, 9] - [0, 16]
(expr [0, 9] - [0, 16])))
(str [0, 9] - [0, 16])))
(paragraph [2, 0] - [3, 0]
(expr [2, 0] - [2, 4])
(expr [2, 5] - [2, 12])
(expr [2, 13] - [2, 16])
(expr [2, 17] - [2, 22])))
(str [2, 0] - [2, 4])
(sym [2, 5] - [2, 6])
(str [2, 6] - [2, 12])
(str [2, 13] - [2, 15])
(sym [2, 15] - [2, 16])
(str [2, 17] - [2, 22])
(nl [2, 22] - [3, 0])))
subsection: (section [4, 0] - [16, 0]
headline: (headline [4, 0] - [5, 0]
stars: (stars [4, 0] - [4, 1])
item: (item [4, 2] - [4, 12]
(expr [4, 2] - [4, 6])
(expr [4, 7] - [4, 12])))
(str [4, 2] - [4, 6])
(str [4, 7] - [4, 12])))
plan: (plan [5, 0] - [6, 0]
(entry [5, 0] - [5, 16]
timestamp: (timestamp [5, 0] - [5, 16]
Expand All @@ -72,50 +60,55 @@ Parses as:
(listitem [7, 2] - [8, 0]
bullet: (bullet [7, 2] - [7, 3])
contents: (paragraph [7, 4] - [8, 0]
(expr [7, 4] - [7, 8])
(expr [7, 9] - [7, 10])))
(str [7, 4] - [7, 8])
(str [7, 9] - [7, 10])
(nl [7, 10] - [8, 0])))
(listitem [8, 2] - [11, 0]
bullet: (bullet [8, 2] - [8, 3])
checkbox: (checkbox [8, 4] - [8, 7]
status: (expr [8, 5] - [8, 6]))
status: (sym [8, 5] - [8, 6]))
contents: (paragraph [8, 8] - [9, 0]
(expr [8, 8] - [8, 12])
(expr [8, 13] - [8, 14]))
(str [8, 8] - [8, 12])
(str [8, 13] - [8, 14])
(nl [8, 14] - [9, 0]))
contents: (list [9, 0] - [11, 0]
(listitem [9, 4] - [10, 0]
bullet: (bullet [9, 4] - [9, 5])
checkbox: (checkbox [9, 6] - [9, 9])
contents: (paragraph [9, 10] - [10, 0]
(expr [9, 10] - [9, 14])
(expr [9, 15] - [9, 16])))
(str [9, 10] - [9, 14])
(str [9, 15] - [9, 16])
(nl [9, 16] - [10, 0])))
(listitem [10, 4] - [11, 0]
bullet: (bullet [10, 4] - [10, 5])
checkbox: (checkbox [10, 6] - [10, 9]
status: (expr [10, 7] - [10, 8]))
status: (str [10, 7] - [10, 8]))
contents: (paragraph [10, 10] - [11, 0]
(expr [10, 10] - [10, 14])
(expr [10, 15] - [10, 16])))))
(str [10, 10] - [10, 14])
(str [10, 15] - [10, 16])
(nl [10, 16] - [11, 0])))))
(listitem [11, 2] - [12, 0]
bullet: (bullet [11, 2] - [11, 3])
contents: (paragraph [11, 4] - [12, 0]
(expr [11, 4] - [11, 8])
(expr [11, 9] - [11, 10])))))
(str [11, 4] - [11, 8])
(str [11, 9] - [11, 10])
(nl [11, 10] - [12, 0])))))
subsection: (section [13, 0] - [16, 0]
headline: (headline [13, 0] - [14, 0]
stars: (stars [13, 0] - [13, 2])
item: (item [13, 3] - [13, 13]
(expr [13, 3] - [13, 13]))
(str [13, 3] - [13, 13]))
tags: (tag_list [13, 14] - [13, 19]
tag: (tag [13, 15] - [13, 18])))
tag: (tag [13, 15] - [13, 18]
(str [13, 15] - [13, 18]))))
body: (body [14, 0] - [16, 0]
(paragraph [15, 0] - [16, 0]
(expr [15, 0] - [15, 4]))))))
(str [15, 0] - [15, 4])
(nl [15, 4] - [16, 0]))))))
```

## Install

For manual install, use `make`.

For neovim, using `nvim-treesitter/nvim-treesitter`, add to your configuration:

```lua
Expand Down
Loading

0 comments on commit e538c2b

Please sign in to comment.