Getting Started¶

Installation¶

mistletoe-ebp is tested for Python 3.5 and above. Install mistletoe within a Conda Environment (recommended):

conda install -c conda-forge mistletoe-ebp

or via pip:

pip install mistletoe-ebp

Alternatively, for code development, clone the repo:

git clone https://github.com/ExecutableBookProject/mistletoe-ebp.git
cd mistletoe-ebp
pip install -e .[testing,code_style]

Usage¶

Basic usage¶

Here’s how you can use mistletoe in a Python script:

import mistletoe

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin)

mistletoe.markdown() defaults to using the HTMLRenderer, but other renderers can be chosen, such as the ones listed in Core Renderers. To produce LaTeX output:

import mistletoe
from mistletoe.renderers.latex import LaTeXRenderer

with open('foo.md', 'r') as fin:
    rendered = mistletoe.markdown(fin, LaTeXRenderer)

From the command-line¶

pip installation enables mistletoe’s command-line utility. Type the following directly into your shell:

mistletoe foo.md

This will transpile foo.md into HTML, and dump the output to stdout. To save the HTML, direct the output into a file:

mistletoe foo.md > out.html

You can pass in custom renderers by including the full path to your renderer class after a -r or --renderer flag:

mistletoe foo.md --renderer custom_renderer.CustomRenderer

Running mistletoe without specifying a file will land you in interactive mode. Like Python’s REPL, interactive mode allows you to test how your Markdown will be interpreted by mistletoe:

mistletoe [version 0.9.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
>>> some **bold** text
... and some *italics*
...
<p>some <strong>bold</strong> text
and some <em>italics</em></p>
>>>

The interactive mode also accepts the --renderer flag:

mistletoe [version 0.9.2] (interactive)
Type Ctrl-D to complete input, or Ctrl-C to exit.
Using renderer: LaTeXRenderer
>>> some **bold** text
... and some *italics*
...
\documentclass{article}
\begin{document}

some \textbf{bold} text
and some \textit{italics}
\end{document}
>>>

Customise the parse¶

To exert even greater control over the parsing process, renderers can be initialised with an existing ParseContext instance. This class stores global variables that are utilised during the parsing process, such as such as the block/span tokens to search for, and link/footnote definitions that have been collected. At any one time, one of these objects is set per thread; which can be changed by set_parse_context() and retrieved by get_parse_context().

In the following example, we use the HTMLRenderer to parse a file:

first parsing only tokens that are strictly CommonMark compliant (see block tokens and span tokens), then
including an extended token set (see extended tokens).

from mistletoe import Document, HTMLRenderer, ParseContext, token_sets

commonmark_context = ParseContext(
    find_blocks=token_sets.get_commonmark_block_tokens(),
    find_spans=token_sets.get_commonmark_span_tokens(),
)
extended_context = ParseContext(
    find_blocks=token_sets.get_extended_block_tokens(),
    find_spans=token_sets.get_extended_span_tokens(),
)

with open('foo.md', 'r') as fin:
    rendered1 = mistletoe.markdown(
        fin, renderer=HTMLRenderer,
        parse_context=commonmark_context
    )

    rendered2 = mistletoe.markdown(
        fin, renderer=HTMLRenderer,
        parse_context=extended_context
    )

Programmatic Use¶

To parse the text only to the mistletoe AST, the general entry point is the mistletoe.block_tokens.Document.read() method (athough actually all block tokens have a read method that can be used directly).

>> from mistletoe import Document
>> text = """
.. Here's some *text*
..
.. 1. a list
..
.. > a *quote*"""
>> doc = Document.read(text)
>> doc
Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None)

All tokens have a children attribute:

>> doc.children
[Paragraph(children=2, position=Position(lines=[2:2])),
 List(children=1, loose=False, start_at=1, position=Position(lines=[3:4])),
 Quote(children=1, position=Position(lines=[6:6]))]

or you can walk through the entire syntax tree, using the walk() method:

>> for item in doc.walk():
..     print(item)
WalkItem(node=Paragraph(children=2, position=Position(lines=[2:2])), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=0, depth=1)
WalkItem(node=List(children=1, loose=False, start_at=1, position=Position(lines=[3:4])), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=1, depth=1)
WalkItem(node=Quote(children=1, position=Position(lines=[6:6])), parent=Document(children=3, link_definitions=0, footnotes=0, footref_order=0, front_matter=None), index=2, depth=1)
WalkItem(node=RawText(), parent=Paragraph(children=2, position=Position(lines=[2:2])), index=0, depth=2)
WalkItem(node=Emphasis(children=1), parent=Paragraph(children=2, position=Position(lines=[2:2])), index=1, depth=2)
WalkItem(node=ListItem(children=1, loose=False, leader='1.', prepend=3, next_marker=None, position=Position(lines=[3:4])), parent=List(children=1, loose=False, start_at=1, position=Position(lines=[3:4])), index=0, depth=2)
WalkItem(node=Paragraph(children=2, position=Position(lines=[7:7])), parent=Quote(children=1, position=Position(lines=[6:6])), index=0, depth=2)
WalkItem(node=RawText(), parent=Emphasis(children=1), index=0, depth=3)
WalkItem(node=Paragraph(children=1, position=Position(lines=[4:4])), parent=ListItem(children=1, loose=False, leader='1.', prepend=3, next_marker=None, position=Position(lines=[3:4])), index=0, depth=3)
WalkItem(node=RawText(), parent=Paragraph(children=2, position=Position(lines=[7:7])), index=0, depth=3)
WalkItem(node=Emphasis(children=1), parent=Paragraph(children=2, position=Position(lines=[7:7])), index=1, depth=3)
WalkItem(node=RawText(), parent=Paragraph(children=1, position=Position(lines=[4:4])), index=0, depth=4)
WalkItem(node=RawText(), parent=Emphasis(children=1), index=0, depth=4)

You could even build your own AST programatically!

>> from mistletoe import block_tokens, span_tokens, HTMLRenderer
>> doc = block_tokens.Document(children=[
..    block_tokens.Paragraph(
..        children=[
..            span_tokens.Emphasis(
..                children=[span_tokens.RawText("hallo")]
..            )
..    ])
.. ])
>> HTMLRenderer().render(doc)
"<p><em>hallo</em></p>"

The Parse Process¶

At a lower level, the actual parsing process is split into two stages:

The full source text is read into an AST with all the span/inline level text stored as raw text in SpanContainer. This allows all link definitions and (if included) footnote definitions to be read, before references are processed.
We walk through this intermediary AST and ‘expand’ the SpanContainer to produce all the span tokens; inspecting the global context for available definitions.

This process is illustrated in the following example, using the lower level parse method, tokenize_main():

>> from mistletoe.block_tokenizer import tokenize_main, SourceLines
>> lines = SourceLines('a [text][key]\n\n[key]: link "target"', standardize_ends=True)
>> paragraph = tokenize_main(lines, expand_spans=False)[0]
>> paragraph.children
SpanContainer('a [text][key]')

>> from mistletoe.parse_context import get_parse_context
>> get_parse_context()
ParseContext(block_cls=11,span_cls=9,link_defs=1,footnotes=0)
>> get_parse_context().link_definitions
{'key': ('link', 'target')}

>> paragraph.children.expand()
[RawText(), Link(target='link', title='target')]

Important

If directly using tokenize_main(), you should ensure that the global context is reset, if you don’t want to use previously read defintions:

>> get_parse_context(reset=True)

Performance¶

mistletoe is the fastest CommonMark compliant implementation in Python. Try the benchmarks yourself by installing pip install mistletoe-ebp[benchmark] and running:

$ mistletoe-bench test/test_samples/syntax.md
Test document: syntax.md
Test iterations: 1000
Running 7 test(s) ...
=====================
markdown        (3.2.1): 31.13 s
markdown:extra  (3.2.1): 42.45 s
mistune         (0.8.4): 11.49 s
commonmark      (0.9.1): 47.94 s
mistletoe       (0.9.4): 35.58 s
mistletoe:extra (0.9.4): 40.37 s
panflute        (1.12.5): 168.06 s

notes:

markdown without extra does not parse some CommonMark syntax, like fenced code blocks (see Python-Markdown Extra)
mistletoe uses only CommonMark compliant tokens, whereas mistletoe:extra includes Extension Tokens.
panflute calls pandoc via a subprocess

We notice that Mistune is the fastest Markdown parser, and by a good margin, which demands some explanation. mistletoe’s biggest performance penalty comes from stringently following the CommonMark spec, which outlines a highly context-sensitive grammar for Markdown. Mistune takes a simpler approach to the lexing and parsing process, but this means that it cannot handle more complex cases, e.g., precedence of different types of tokens, escaping rules, etc.

To see why this might be important to you, consider the following Markdown input (example 392 from the CommonMark spec):

***foo** bar*

The natural interpretation is:

<p><em><strong>foo</strong> bar</em></p>

… and it is indeed the output of Python-Markdown, Commonmark-py and mistletoe. Mistune (version 0.8.3) greedily parses the first two asterisks in the first delimiter run as a strong-emphasis opener, the second delimiter run as its closer, but does not know what to do with the remaining asterisk in between:

<p><strong>*foo</strong> bar*</p>

The implication of this runs deeper, and it is not simply a matter of dogmatically following an external spec. By adopting a more flexible parsing algorithm, mistletoe allows us to specify a precedence level to each token class, including custom ones that you might write in the future. Code spans, for example, has a higher precedence level than emphasis, so

*foo `bar* baz`

… is parsed as:

<p>*foo <code>bar* baz</code></p>

… whereas Mistune parses this as:

<p><em>foo `bar</em> baz`</p>

Of course, it is not impossible for Mistune to modify its behavior, and parse these two examples correctly, through more sophisticated regexes or some other means. It is nevertheless highly likely that, when Mistune implements all the necessary context checks, it will suffer from the same performance penalties.

Contextual analysis is why Python-Markdown is slow, and why CommonMark-py is slower. The lack thereof is the reason mistune enjoys stellar performance among similar parser implementations, as well as the limitations that come with these performance benefits.

If you want an implementation that focuses on raw speed, mistune remains a solid choice. If you need a spec-compliant and readily extensible implementation, however, mistletoe is still marginally faster than Python-Markdown, while supporting more functionality (lists in block quotes, for example), and significantly faster than CommonMark-py.

One last note: another bottleneck of mistletoe compared to mistune is the function overhead. Because, unlike mistune, mistletoe chooses to split functionality into modules, function lookups can take significantly longer than mistune. To boost the performance further, it is suggested to use PyPy with mistletoe. Benchmark results show that on PyPy, mistletoe’s performance is on par with mistune:

$ pypy3 test/benchmark.py mistune mistletoe
Test document: test/samples/syntax.md
Test iterations: 1000
Running tests with mistune, mistletoe...
========================================
mistune: 13.645681533998868
mistletoe: 15.088351159000013

mistletoe-ebp Developer Guide