Client-side syntax highlighting in JS: apparently no good options ================================================================= For a project I’m working on called “handaxeweb”, and also for improving access to kragen-hacks, I want to have syntax-highlighting for my code examples. The best way, especially for code examples embedded in Markdown (where it’s inconvenient to specify classes on `
`
or `` tags) is probably to do it in JS when the page loads.

The current version of “handaxeweb” is in Lua, but I write software in
lots of different languages, so I’m looking for syntax highlighting 
that supports Lua and many other languages. 
Also, I have a lot of old web pages that
include software written in lots of different languages. So I’d like
something that can autodetect languages. (Also, in Markdown, it’s just 
as inconvenient to specify what language your `
` tags are in as
it is to specify that they represent source code.)

Initial candidates
------------------

* : SHJS, uses GNU Source-Highlight for
  its syntax definitions. Currently supports 39 languages but not
  Lua. Comes with a bunch of “highlighting themes” too. Sounds
  architecturally awesome but I would like something that works out of
  the box. GPLv3.  claims
  Source-Highlight supports Lua now (plus 64 other languages) so that
  could maybe work.

* What’s GitHub using? They do it on the server side, using span.nc
  for “new class”, .s for “string”, .se for “string escape”, and so
  on.

* The most common one might be 
  SyntaxHighlighter. 
  says to look at  instead. The new page redirects 
  to . Claims to be widely
  used, including by Mozilla, Apache, and Wordpress. Looks extremely
  active. Also supports themes, with CSS. Supports 23 languages but
  not Lua. There’s are two third-party “brushes” for it that support
  Lua listed 
  on 
  along with another 45 languages. Claimed to be very slow 
  by .

*  has a “Highlight” program where you define
  new syntaxes in Lua and apparently already supports 150
  languages. But it’s not in JS.

* 
  is some guy’s JS syntax-highlighting engine, sometimes called
  DlHighlight. GPL. The main file is `hl/highlight.js`. Apparently
  supports only JS and XML in version 0.1.

*  only
  supports 8 languages, not including Lua.

*  supports 10 languages, not including
  Lua. Main selling point is complete PHP support.

*  only highlights JS.

*  is Highlight.js. It
  supports 32 languages, and because it’s Russian, these include Lua,
  nginx config files, and AVR assembly. Autodetects languages. Has a
  server-based pruner and minifier for your convenience. BSD
  license. No source tarballs.

*  is Lighter.js, a MooTools
  plugin. Apparently depends on PHP!? Better documentation than most;
  apparently requires classes on `
` tags to declare language
  names. Appears to support 17 languages, not including Lua.

*  is Google Code
  Prettify, which drives code.google.com (but the version on Google
  Code is six months out of date). ASL 2.0. Autodetects language, but
  requires @class. Supports 27 languages, including Lua. FAQ claims it
  doesn’t work on obfuscated code, which probably means it
  occasionally fails on non-obfuscated code too.

So, my leading candidates are Highlight.js (32 languages), Google Code
Prettify (27 languages), SyntaxHighlighter with one of the third-party
Lua brushes (46 languages), and SHJS plus a current version of GNU
Source-Highlight (64 languages).

Another thing a friend suggested later:

*  is 
  jQuery.Syntax: . 
  GNU AGPLv3. It looks a lot like GitHub’s highlighting by
  default. Supports 25 languages, including Lua. Author claims
  SyntaxHighlighter inspired him. Doesn’t seem to autodetect
  languages, but should be easy to force to use any particular
  language.

How I chose to try SHJS first
-----------------------------

Of course, language counts don’t tell the whole story at all. In some
cases (Google Code Prettify, especially) entire families of languages
are lumped together, for example, and in other cases there are lots of
extremely simple configuration-file languages that are supported. To
try to clean that up, I’m checking which ones claim to support the
languages I actually use.

Recent posts on kragen-hacks have code in x86 assembly (with both gas
and Intel syntax), BASIC-80, C, ANS Forth, elisp, Lua, Perl, Python,
and Ruby. Future posts will probably have JavaScript, Makefiles, PHP,
and sh.

Out of these 14 languages, none of my four candidates supports Forth
or BASIC-80.

Additionally, Highlight.js also lacks support for both assembly
languages and Makefiles (I'm assuming its C++ support is adequate for
C); Google Code Prettify lacks support for both assembly languages;
SyntaxHighlighter lacks support for gas (possibly), elisp and
Makefiles (!) (again, assuming its C++ support is adequate for C);
Source-Highlight lacks support for possibly one of the assembly
syntaxes.

(Added later: jQuery.Syntax supports all but one assembly syntax,
BASIC-80, Forth, and Makefiles.)

So, I’m going to start with SHJS and Source-Highlight, particularly on
the theory that it should be easiest to add more syntaxes to them. If
that doesn’t work out, Highlight.js seems like the second-best option,
followed by Google Code Prettify and then SyntaxHighlighter.

Rejecting SHJS
--------------

1. 23:23: Downloading a source distribution 
   from  (linked 
   to )
   and a source distribution of Source-Highlight from
   ftp://ftp.gnu.org/gnu/src-highlite/.

        kragen@VOSTRO9:~/pkgs$ wget \
            ftp://ftp.gnu.org/gnu/src-highlite/source-highlight-3.1.4.tar.gz
        kragen@VOSTRO9:~/pkgs$ tar xzvf source-highlight-3.1.4.tar.gz 
        kragen@VOSTRO9:~/pkgs$ cd source-highlight-3.1.4/
        kragen@VOSTRO9:~/pkgs/source-highlight-3.1.4$ less src/lua.lang 

    That looks pretty good.

    Hmm, it’s a little alarming that SHJS hasn’t been updated since 2008!

        kragen@VOSTRO9:~/pkgs/source-highlight-3.1.4$ cd ..
        kragen@VOSTRO9:~/pkgs$ unzip shjs-0.6-src.zip 
        ...
        kragen@VOSTRO9:~/pkgs$ cd shjs-0.6-src/

2. 23:31: Trying to build a Lua syntax definition file.

        kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ ./sh2js.pl \
            ../source-highlight-3.1.4/src/lua.lang 

    Hmm, looks like it needs Parse::RecDescent. Somehow
    `README-SRC.txt` neglected to mention that.

        kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ sudo aptitude install \
            libparse-recdescent-perl 
        [sudo] password for kragen: 
        ...
        kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ ./sh2js.pl \
            ../source-highlight-3.1.4/src/lua.lang 

              ERROR (line 11): Invalid Language: Was expecting End but found
                               "environment comment delim `--\[(=*)\[` "]" + @{1} +
                               "]" multiline nested begin" instead
        Invalid input

    My best guess here is that Source-Highlight now supports some
    language features that SHJS doesn’t yet, although perusing the
    grammar in `SourceHighlight/DOM.pm` suggests that it's *supposed*
    to support the construct it’s failing to parse, and the parsing
    error is just misleading.

3. 23:41: giving up for now. It’s likely I could debug the problem in
    half an hour or so, but then I’d have a bug fix for an apparently
    unmaintained project, and unless I want to become the maintainer,
    the bug fix would probably be ignored in a patch queue somewhere.

Rejecting Highlight.js
----------------------

1. 23:51: No tarballs. Installing bzr.

        kragen@VOSTRO9:~/devel$ sudo aptitude install bzr
        ...

2. 23:55: Aptitude finally finished, so I can grab the source:

        kragen@VOSTRO9:~/devel$ bzr branch \
            http://bazaar.launchpad.net/~isagalaev/+junk/highlight
        bzr: ERROR: Unknown repository format: 'Bazaar repository format 2a (needs bzr 1.16 or later)\n'

    Oh, that's very unfortunate. But I'm using a bzr version from just
    last year! Well, okay.

3. 23:58: Installing a newer bzr version. The version in Debian Lenny
    is “1.5-1.1”; I'm guessing that’s older than the 1.13.1 I have
    installed already, although I'm not totally sure.
     has version 2.1.1.
    Downloaded it and now building:

        kragen@VOSTRO9:~/devel$ pushd ~/pkgs
        ~/pkgs ~/devel ~/devel/peg-bootstrap
        kragen@VOSTRO9:~/pkgs$ tar tzvf bzr_2.1.1-1.debian.tar.gz 
        ...
        kragen@VOSTRO9:~/pkgs$ tar xzf bzr_2.1.1.orig.tar.gz 
        kragen@VOSTRO9:~/pkgs$ cd bzr-2.1.1/
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ tar xzf ../bzr_2.1.1-1.debian.tar.gz 
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ ./debian/rules build
        ./debian/rules:5: /usr/share/cdbs/1/rules/debhelper.mk: No such file or directory
        ./debian/rules:6: /usr/share/cdbs/1/class/python-distutils.mk: No such file or directory
        make: *** No rule to make target `/usr/share/cdbs/1/class/python-distutils.mk'.  Stop.

    Argh. I don’t have the basic Debian package building utilities
    installed.

        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude install debhelper
        (doesn’t do anything)
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo apt-get build-dep bzr
        ...
        After this operation, 43.5MB of additional disk space will be used.
        Do you want to continue [Y/n]? 

    Sigh. Yes, but I need to delete some stuff. This netbook comes with
    only 8GiB of space in total. I didn’t really need those home
    movies, or that amateur movie about contact improv I downloaded
    from YouTube, or the trailer for the DJ Hero video game.

4. 00:14: apt-get has finally finished.

        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ ./debian/rules build
        ...

    Now I get to wait for GCC to compile massive amounts of bzr from
    source. I thought bzr was written in Python!

5. 00:38: at some point in the last few minutes, the build finished;
    now to build packages:

        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ fakeroot ./debian/rules binary
        ...

6. 00:50: looks like that finished at 00:45. Now to install the
    freshly-built packages.

        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo dpkg -i \
                ../bzr-doc_2.1.1-1_all.deb ../bzr_2.1.1-1_i386.deb 
        [sudo] password for kragen: 
        ...
         bzr-doc conflicts with bzr (<< 2.0.1)
        ...
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude remove bzr
        ...
        Need to get 0B of archives. After unpacking 17.9MB will be freed.
        Do you want to continue? [Y/n/?] 
        ...
        Current status: 0 broken [-2].
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo dpkg -i \
            ../bzr-doc_2.1.1-1_all.deb ../bzr_2.1.1-1_i386.deb 
        ...
         bzr depends on python-configobj; however:
          Package python-configobj is not installed.
        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude install python-configobj
        ...
        Setting up bzr (2.1.1-1) ...
        ...

7. 00:59: Okay, that finally finished. Now trying again to pull their
    source:

        kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ popd
        ~/devel ~/devel/peg-bootstrap
        kragen@VOSTRO9:~/devel$ bzr branch \
            http://bazaar.launchpad.net/~isagalaev/+junk/highlight
        Branched 379 revision(s).
        kragen@VOSTRO9:~/devel$ cd highlight/
        kragen@VOSTRO9:~/devel/highlight$ 

    I have it at last!

8. 01:05: Now what? How do I use it? `src/readme.eng.txt` is the
    documentation, and it suggests a few lines of JS:

    > Downloaded package includes file "highlight.pack.js" ...

    >     

    Naturally, though, the source checkout doesn’t contain that file,
    even though the downloaded package does. The build script seems to
    be `tools/build.py`...

9. 01:12: Trying to build the packed JS file:

        kragen@VOSTRO9:~/devel/highlight$ tools/build.py
        bash: tools/build.py: Permission denied
        kragen@VOSTRO9:~/devel/highlight$ python !!
        python tools/build.py
        ...
        IOError: [Errno 2] No such file or directory: '/home/kragen/devel/highlight/packed/profile.js'
        kragen@VOSTRO9:~/devel/highlight$ python tools/pack.py
        Building highlight.js
        Building profile.js

    That looks promising. Takes a while, though.

    `top` reveals that I have five or six runaway `gtk-gnash` processes
    in my browser. That explains why everything’s been going so slowly
    for the last couple of hours! `killall -9 gtk-gnash` makes things
    go a bit faster. I’m an idiot.

        ...
        Building lua.js
        Building python.js
        kragen@VOSTRO9:~/devel/highlight$ python tools/build.py
        kragen@VOSTRO9:~/devel/highlight$ find -name '*.pack.js'
        ./src/highlight.pack.js
        kragen@VOSTRO9:~/devel/highlight$ 

10. 01:18: Trying it out with this source:

        
        

    This appears to replace the entire body of my document with the
    first `
` block, which Highlight.js incorrectly guessed was
    Ruby (it was awk, which it doesn’t claim to support). It also
    seems to be making a three-deep tree of `` tags,
    i.e. `code>code>code`. And it doesn’t seem to come with an
    existing stylesheet.

    Probably what’s going on is that it’s replacing the parent node 
    of the specified node on the assumption that it's a `pre>code` kind
    of case. And indeed wrapping the `
` tag in an extra `
` solves the problem: It’s a little slow at startup; highlighting around 320 lines of code takes about 15 seconds on this 1.6GHz Atom netbook. But there's still no actual highlighting because Highlight.js merely applied some CSS classes; it didn’t supply a stylesheet. Or rather, it supplies 13 of them, but doesn’t add a link to one by default. It does actually mention this, but the documentation forgot to mention it in the “Installation and usage” section. 11. 01:34: Adding a link to a stylesheet. Oh my. That looks pretty nice. And even the misidentified “Ruby” works reasonably well, although a few of the other misidentifications don’t. But yeah, about 15 seconds for 320 lines of code. In practice the autodetect works very poorly on Lua. It commonly misidentifies it as JS, TeX, or Ruby; the Ruby highlighting works reasonably well, but the others don’t. My build script in sh, beginning with a `#!/bin/sh`, got misidentified as AVR assembly. 12. 01:48: My conclusions on Highlight.js: * Its language autodetect doesn’t work that well. * It’s slow: about 20 lines of code per second in my old FF3.0 on a 1.6GHz Atom. * The latest development version is somewhat buggy (at least, it behaved in an unexpected way that made my entire document fail to render), although I was able to work around this. * It does work and produces good results when it finally finishes running, if it guesses the right language. Trying Google Code Prettify --------------------------- 1. 01:52: Downloading: on there is a link to . kragen@VOSTRO9:~/pkgs$ unzip -v prettify-3-Dec-2009.zip ... 5118 Defl:N 1897 63% 12-03-09 11:04 785f9c6d CHANGES.html ... kragen@VOSTRO9:~/pkgs$ mkdir prettify kragen@VOSTRO9:~/pkgs$ cd prettify/ kragen@VOSTRO9:~/pkgs/prettify$ unzip ../prettify-3-Dec-2009.zip 2. Adding this code from `README.html` to my document: 3. 01:58: That doesn’t work because I don’t have `class="prettyprint"` on my `
` tags. It must be possible to work around that
   somehow, but there doesn’t seem to be documentation in the source
   package. I guess I can do an analogous thing to what I did with
   Highlight.js:

        (function() {
         var pres = document.getElementsByTagName('pre');
         for (var ii = 0; ii != pres.length; ii++) {
          pres[ii].setAttribute('class', 'prettyprint');
         }
         prettyPrint();
        })()

4. 02:06: Well, that did make it add classes, but I screwed up the URL
   on the stylesheet link, so no highlighting. Fixed:

        

5. 02:09: Got highlighting. Looks like the autodetect is failing
    miserably. I can’t tell what it thinks all this Lua is, but
    whatever it is, it’s not highlighting the Lua comments uniformly as
    comments.  And it highlights the word “new” in my shell script as
    if it were a keyword. Changing to this doesn’t help:

          pres[ii].setAttribute('class', 'prettyprint lang-lua');

    One big advantage this has over Highlight.lua is that it highlights
    asynchronously instead of blocking page load. It also seems to be
    about 5 times faster, finishing about 3 seconds after page load
    instead of 15.

6. 02:18: My conclusions with Google Code Prettify:

    * Fast.
    * Easy to use.
    * Produces poor results.
    * Only one theme.

Still to check out
------------------

I still need to check out SyntaxHighlighter and jQuery.Syntax; they
both sound reasonably promising despite the lack of autodetection.
(Maybe I’ll have to write an autodetection library to use with
whatever syntax-highlighting library I end up using.) But I’m not
going to do it tonight.

Fixing SHJS is appearing more tempting, too, after spending two-plus
hours struggling with Highlight.js.

----

2010-07-09...10, Kragen Javier Sitaker, Buenos Aires.