Using mistletoe

Installation

mistletoe-ebp is tested for Python 3.5 and above. Install mistletoe within a Conda Environment (recommended):

conda install -c conda-forge mistletoe-ebp

or via pip:

pip install mistletoe-ebp

Alternatively, clone the repo:

git clone https://github.com/ExecutableBookProject/mistletoe-ebp.git
cd mistletoe-ebp
pip install -e .[testing,code_style]

Usage

Basic usage

Here’s how you can use mistletoe in a Python script:

import mistletoe

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin)

mistletoe.markdown() uses mistletoe’s default settings: allowing HTML mixins and rendering to HTML. The function also accepts an additional argument renderer. To produce LaTeX output:

import mistletoe
from mistletoe.renderers.latex import LaTeXRenderer

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin, LaTeXRenderer)

Finally, here’s how you would manually specify extra tokens and a renderer for mistletoe. In the following example, we use HTMLRenderer to render the AST, which adds HTMLBlock and HTMLSpan to the normal parsing process.

from mistletoe import Document, HTMLRenderer

with open('foo.md', 'r') as fin:
    with HTMLRenderer() as renderer:
        rendered = renderer.render(Document.read(fin))

From the command-line

pip installation enables mistletoe’s command-line utility. Type the following directly into your shell:

mistletoe foo.md

This will transpile foo.md into HTML, and dump the output to stdout. To save the HTML, direct the output into a file:

mistletoe foo.md > out.html

You can pass in custom renderers by including the full path to your renderer class after a -r or --renderer flag:

mistletoe foo.md --renderer custom_renderer.CustomRenderer

Running mistletoe without specifying a file will land you in interactive mode. Like Python’s REPL, interactive mode allows you to test how your Markdown will be interpreted by mistletoe:

mistletoe [version 0.9.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> some **bold** text
... and some *italics*
...
<p>some <strong>bold</strong> text
and some <em>italics</em></p>
>>>

The interactive mode also accepts the --renderer flag:

mistletoe [version 0.9.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
Using renderer: LaTeXRenderer
>>> some **bold** text
... and some *italics*
...
\documentclass{article}
\begin{document}

some \textbf{bold} text
and some \textit{italics}
\end{document}
>>>

Performance

mistletoe is the fastest CommonMark compliant implementation in Python. Try the benchmarks yourself by running:

$ python test/test_samples/benchmark.py  # all results in seconds
Test document: syntax.md
Test iterations: 1000
Running tests with markdown, mistune, commonmark, mistletoe...
==============================================================
markdown: 40.270715949
mistune: 11.054077996000004
commonmark: 44.426582849
mistletoe: 34.47910147500001

We notice that Mistune is the fastest Markdown parser, and by a good margin, which demands some explanation. mistletoe’s biggest performance penalty comes from stringently following the CommonMark spec, which outlines a highly context-sensitive grammar for Markdown. Mistune takes a simpler approach to the lexing and parsing process, but this means that it cannot handle more complex cases, e.g., precedence of different types of tokens, escaping rules, etc.

To see why this might be important to you, consider the following Markdown input (example 392 from the CommonMark spec):

***foo** bar*

The natural interpretation is:

<p><em><strong>foo</strong> bar</em></p>

… and it is indeed the output of Python-Markdown, Commonmark-py and mistletoe. Mistune (version 0.8.3) greedily parses the first two asterisks in the first delimiter run as a strong-emphasis opener, the second delimiter run as its closer, but does not know what to do with the remaining asterisk in between:

<p><strong>*foo</strong> bar*</p>

The implication of this runs deeper, and it is not simply a matter of dogmatically following an external spec. By adopting a more flexible parsing algorithm, mistletoe allows us to specify a precedence level to each token class, including custom ones that you might write in the future. Code spans, for example, has a higher precedence level than emphasis, so

*foo `bar* baz`

… is parsed as:

<p>*foo <code>bar* baz</code></p>

… whereas Mistune parses this as:

<p><em>foo `bar</em> baz`</p>

Of course, it is not impossible for Mistune to modify its behavior, and parse these two examples correctly, through more sophisticated regexes or some other means. It is nevertheless highly likely that, when Mistune implements all the necessary context checks, it will suffer from the same performance penalties.

Contextual analysis is why Python-Markdown is slow, and why CommonMark-py is slower. The lack thereof is the reason mistune enjoys stellar performance among similar parser implementations, as well as the limitations that come with these performance benefits.

If you want an implementation that focuses on raw speed, mistune remains a solid choice. If you need a spec-compliant and readily extensible implementation, however, mistletoe is still marginally faster than Python-Markdown, while supporting more functionality (lists in block quotes, for example), and significantly faster than CommonMark-py.

One last note: another bottleneck of mistletoe compared to mistune is the function overhead. Because, unlike mistune, mistletoe chooses to split functionality into modules, function lookups can take significantly longer than mistune. To boost the performance further, it is suggested to use PyPy with mistletoe. Benchmark results show that on PyPy, mistletoe’s performance is on par with mistune:

$ pypy3 test/benchmark.py mistune mistletoe
Test document: test/samples/syntax.md
Test iterations: 1000
Running tests with mistune, mistletoe...
========================================
mistune: 13.645681533998868
mistletoe: 15.088351159000013