Friday, October 4, 2013

Termcat: A Hacker School Project

I'm currently finding myself in the privileged position of attending Hacker School. For those who don't know, Hacker School is a three-month program in New York where a bunch of motivated people practice and learn about programming.

I have decided that I will spend much of my time at Hacker School on an open-source project that I've been thinking about for a while now. I call it Termcat. This was originally a codename, but it's growing on me and I might keep it.

Termcat is, or will be, a markup language that targets HTML for output. It is inspired by Markdown and LaTeX. With Markdown it shares the ideal that source code should be easy to read. With LaTeX it shares a concern for scientific writing and typography. Like LaTeX, my intention is for Termcat to be fully programmable and to allow graphics to be generated from code and data (via D3.js). Support for MathML is a priority.

I've been using LaTeX quite extensively for the past five years. It's no doubt a very powerful system. LaTeX comes with tens of thousands of amazing packages for many niche—and not so niche—use cases. All the same, LaTeX has many flaws. Most aggravatingly, its syntax makes LaTeX source code difficult to read. As a programming environment LaTeX is best described as primitive and bizarre. Moreover, error messages in LaTeX are arcane and many packages are incompatible with one another.

Unfortunately, there's currently no serious alternative for LaTeX if you need to enter a lot of mathematical expressions. For instance, word processors require that you open an equation editor every time you want to enter an equation or mathematical symbol—insofar they support mathematical notation at all. This makes word processors a non-starter for many scientists. There's MathML, but MathML is much too verbose to write by hand.

I believe it's possible to do much better. Consider the following LaTeX code:
$E = \{\langle a, n, n' \rangle \subseteq I \times N \times N \mid Pa \text{ and } n < n' \}$
This is rendered as follows:
For Termcat I intend to allow identical output (albeit in HTML and MathML) to be generated from code like this:
E = {<a, n, n'> :subseteq I :times N :times N | Pa \and n < n'}
The idea is that Termcat would recognize that '=' is a binary relation. This means it would know that the terms to the left and right of it are mathematical expressions. It would also understand that the curly brackets delimit the start and end of the expression on the right. The lack of a space between the symbols '<' and 'i' would indicate that <a, n, n'> is a tuple. Finally, 'and' would be rendered as plain text because it is escaped using a backslash.

I only started hacking on Termcat this week and the above doesn't work yet. So far I have a basic parser, HTML output, and support for Markdown titles and bullet lists. For the coming week I intend to focus on some foundational work and also get started on elementary support for mathematical notation.

I plan to blog once a week for the duration of my stay at Hacker School. True, I haven't exactly been a prolific blogger in the past, but this time around there's some people I owe $5 for every week that I don't blog. :) Consequently, you can expect a series of blog posts in which I expound on the ideas behind Termcat.

For those interested, the source code can be found at https://github.com/jdevuyst/termcat.