A study in “insanity”.
This project is a converter from DokuWiki to AsciiDoc markup. It’s the most complete converter between these markups you can find.[1]
What’s especially interesting about this converter is the implementation. It’s completely written in sed! Moreover, using just BRE (Basic Regular Expression) with few GNU extensions and some glue in (POSIX) shell. That’s quite insane, isn’t it? I’d say that it’s like trying to dig a hole with just teaspoon, but it works surprisingly good! Hence the subtitle: “A study in insanity”.[2] Read more in But Why Sed?.
-
Linux system with common userland (Busybox, GNU coreutils, …)
-
POSIX-sh compatible shell (e.g. Busybox ash, dash, ZSH, Bash, …)
-
sed
supporting GNU extensions:\?
,\+
,\|
,^
inside subexpressions, andq
with exit code -
Python for rewriting internal links
Most syntactic DokuWiki constructs can be converted to AsciiDoc.
Known limitations in the conversion from DokuWiki to AsciiDoc:
-
No syntax checking is done of the input file.
-
Appending a downloadable file (code snippet) to a Code block is not available (the link to the file is silently discarded by the converter).
-
Anchors (links to a location on the same page) may not work due to different naming conventions.
-
The vertical spanning (over several rows) of table cells is not provided.
-
Text decorations and “text not to be parsed” marks are only converted if opening and closing marks are on the same line.
-
Windows shares and Interwiki links are not converted.
-
Embedding HTML and PHP is not available.
-
RSS/Atom feed integration is not available.
-
Text to image conversion is not available.
In addition to the basic DokuWiki syntax, the following plugins are supported:
Plugin | Limitations | Notes |
---|---|---|
All attributes (prompt, comment, continue) are ignored. |
Converted to a source block with lang “sh”. |
|
No limitations. |
Code title is converted to a block title. |
|
Line number is ignored, no difference between header and footer. |
Header/footer is converted to a block title. |
|
If comment is not properly terminated, the converter terminates with an error. |
Content between the marks is discarded from the output. |
|
Recognized only when opening and closing tags are on the same line. |
Content between |
|
Latex delimiters are not recognized in a nowiki area.
|
|
|
No limitations. |
Note is converted to a corresponding admonition block, or admonition paragraph if |
|
Only (un)ordered list item with single extra paragraph denoted by |
You may ask why I wrote this converter in sed, despite I consider it a bit insane. Well, I’m not the original author.
Back in winter 2017, I needed some converter from DokuWiki to AsciiDoc. The only ready-made solution I found was convtags – a bidirectional (!) converter between DokuWiki and AsciiDoc. It was originally developed for the Slackware Documentation Project that used to run on DokuWiki (see this page). I tried it on few pages and it worked very well, even on complicated tables!
However, convtags was written for an older AsciiDoc syntax and didn’t support some features I needed. It appeared to be quicker to improve convtags, even though it’s written completely in sed, than write a proper converter from scratch. So I started digging into it.
At first it went well, but as I tried it on more pages, more and more bugs (in the original code) started to appear. Not just minor bugs, but even quite fundamental. Eventually I fixed or rewrote most of the convtags’ code.
Now it works very well for wide variety of DokuWiki pages, but it took me much more time than I expected. I regret a little that I didn’t wrote a proper, modular converter from scratch instead. However, it was really very interesting experience (although a bit masochist). I really like regular expressions, sed and challenges… and this was hell challenge! I’ve learned a lot thanks to it.
dokuwiki2adoc is a fork of convtags created by Didier Spaier for the Slackware Linux Project. The most essential sed magic in this converter is Didier’s work. I’ve fixed many bugs and improved it a lot, yet I still don’t understand how exactly some of the tricks with hold space work. Didier is a real sed master!
This project is licensed under BSD-1-Clause License. For the full text of the license, see the LICENSE file.