Client-side syntax highlighting in JS: apparently no good options ================================================================= For a project I’m working on called “handaxeweb”, and also for improving access to kragen-hacks, I want to have syntax-highlighting for my code examples. The best way, especially for code examples embedded in Markdown (where it’s inconvenient to specify classes on `
`
or `` tags) is probably to do it in JS when the page loads.
The current version of “handaxeweb” is in Lua, but I write software in
lots of different languages, so I’m looking for syntax highlighting
that supports Lua and many other languages.
Also, I have a lot of old web pages that
include software written in lots of different languages. So I’d like
something that can autodetect languages. (Also, in Markdown, it’s just
as inconvenient to specify what language your `` tags are in as
it is to specify that they represent source code.)
Initial candidates
------------------
* : SHJS, uses GNU Source-Highlight for
its syntax definitions. Currently supports 39 languages but not
Lua. Comes with a bunch of “highlighting themes” too. Sounds
architecturally awesome but I would like something that works out of
the box. GPLv3. claims
Source-Highlight supports Lua now (plus 64 other languages) so that
could maybe work.
* What’s GitHub using? They do it on the server side, using span.nc
for “new class”, .s for “string”, .se for “string escape”, and so
on.
* The most common one might be
SyntaxHighlighter.
says to look at instead. The new page redirects
to . Claims to be widely
used, including by Mozilla, Apache, and Wordpress. Looks extremely
active. Also supports themes, with CSS. Supports 23 languages but
not Lua. There’s are two third-party “brushes” for it that support
Lua listed
on
along with another 45 languages. Claimed to be very slow
by .
* has a “Highlight” program where you define
new syntaxes in Lua and apparently already supports 150
languages. But it’s not in JS.
*
is some guy’s JS syntax-highlighting engine, sometimes called
DlHighlight. GPL. The main file is `hl/highlight.js`. Apparently
supports only JS and XML in version 0.1.
* only
supports 8 languages, not including Lua.
* supports 10 languages, not including
Lua. Main selling point is complete PHP support.
* only highlights JS.
* is Highlight.js. It
supports 32 languages, and because it’s Russian, these include Lua,
nginx config files, and AVR assembly. Autodetects languages. Has a
server-based pruner and minifier for your convenience. BSD
license. No source tarballs.
* is Lighter.js, a MooTools
plugin. Apparently depends on PHP!? Better documentation than most;
apparently requires classes on `` tags to declare language
names. Appears to support 17 languages, not including Lua.
* is Google Code
Prettify, which drives code.google.com (but the version on Google
Code is six months out of date). ASL 2.0. Autodetects language, but
requires @class. Supports 27 languages, including Lua. FAQ claims it
doesn’t work on obfuscated code, which probably means it
occasionally fails on non-obfuscated code too.
So, my leading candidates are Highlight.js (32 languages), Google Code
Prettify (27 languages), SyntaxHighlighter with one of the third-party
Lua brushes (46 languages), and SHJS plus a current version of GNU
Source-Highlight (64 languages).
Another thing a friend suggested later:
* is
jQuery.Syntax: .
GNU AGPLv3. It looks a lot like GitHub’s highlighting by
default. Supports 25 languages, including Lua. Author claims
SyntaxHighlighter inspired him. Doesn’t seem to autodetect
languages, but should be easy to force to use any particular
language.
How I chose to try SHJS first
-----------------------------
Of course, language counts don’t tell the whole story at all. In some
cases (Google Code Prettify, especially) entire families of languages
are lumped together, for example, and in other cases there are lots of
extremely simple configuration-file languages that are supported. To
try to clean that up, I’m checking which ones claim to support the
languages I actually use.
Recent posts on kragen-hacks have code in x86 assembly (with both gas
and Intel syntax), BASIC-80, C, ANS Forth, elisp, Lua, Perl, Python,
and Ruby. Future posts will probably have JavaScript, Makefiles, PHP,
and sh.
Out of these 14 languages, none of my four candidates supports Forth
or BASIC-80.
Additionally, Highlight.js also lacks support for both assembly
languages and Makefiles (I'm assuming its C++ support is adequate for
C); Google Code Prettify lacks support for both assembly languages;
SyntaxHighlighter lacks support for gas (possibly), elisp and
Makefiles (!) (again, assuming its C++ support is adequate for C);
Source-Highlight lacks support for possibly one of the assembly
syntaxes.
(Added later: jQuery.Syntax supports all but one assembly syntax,
BASIC-80, Forth, and Makefiles.)
So, I’m going to start with SHJS and Source-Highlight, particularly on
the theory that it should be easiest to add more syntaxes to them. If
that doesn’t work out, Highlight.js seems like the second-best option,
followed by Google Code Prettify and then SyntaxHighlighter.
Rejecting SHJS
--------------
1. 23:23: Downloading a source distribution
from (linked
to )
and a source distribution of Source-Highlight from
ftp://ftp.gnu.org/gnu/src-highlite/.
kragen@VOSTRO9:~/pkgs$ wget \
ftp://ftp.gnu.org/gnu/src-highlite/source-highlight-3.1.4.tar.gz
kragen@VOSTRO9:~/pkgs$ tar xzvf source-highlight-3.1.4.tar.gz
kragen@VOSTRO9:~/pkgs$ cd source-highlight-3.1.4/
kragen@VOSTRO9:~/pkgs/source-highlight-3.1.4$ less src/lua.lang
That looks pretty good.
Hmm, it’s a little alarming that SHJS hasn’t been updated since 2008!
kragen@VOSTRO9:~/pkgs/source-highlight-3.1.4$ cd ..
kragen@VOSTRO9:~/pkgs$ unzip shjs-0.6-src.zip
...
kragen@VOSTRO9:~/pkgs$ cd shjs-0.6-src/
2. 23:31: Trying to build a Lua syntax definition file.
kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ ./sh2js.pl \
../source-highlight-3.1.4/src/lua.lang
Hmm, looks like it needs Parse::RecDescent. Somehow
`README-SRC.txt` neglected to mention that.
kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ sudo aptitude install \
libparse-recdescent-perl
[sudo] password for kragen:
...
kragen@VOSTRO9:~/pkgs/shjs-0.6-src$ ./sh2js.pl \
../source-highlight-3.1.4/src/lua.lang
ERROR (line 11): Invalid Language: Was expecting End but found
"environment comment delim `--\[(=*)\[` "]" + @{1} +
"]" multiline nested begin" instead
Invalid input
My best guess here is that Source-Highlight now supports some
language features that SHJS doesn’t yet, although perusing the
grammar in `SourceHighlight/DOM.pm` suggests that it's *supposed*
to support the construct it’s failing to parse, and the parsing
error is just misleading.
3. 23:41: giving up for now. It’s likely I could debug the problem in
half an hour or so, but then I’d have a bug fix for an apparently
unmaintained project, and unless I want to become the maintainer,
the bug fix would probably be ignored in a patch queue somewhere.
Rejecting Highlight.js
----------------------
1. 23:51: No tarballs. Installing bzr.
kragen@VOSTRO9:~/devel$ sudo aptitude install bzr
...
2. 23:55: Aptitude finally finished, so I can grab the source:
kragen@VOSTRO9:~/devel$ bzr branch \
http://bazaar.launchpad.net/~isagalaev/+junk/highlight
bzr: ERROR: Unknown repository format: 'Bazaar repository format 2a (needs bzr 1.16 or later)\n'
Oh, that's very unfortunate. But I'm using a bzr version from just
last year! Well, okay.
3. 23:58: Installing a newer bzr version. The version in Debian Lenny
is “1.5-1.1”; I'm guessing that’s older than the 1.13.1 I have
installed already, although I'm not totally sure.
has version 2.1.1.
Downloaded it and now building:
kragen@VOSTRO9:~/devel$ pushd ~/pkgs
~/pkgs ~/devel ~/devel/peg-bootstrap
kragen@VOSTRO9:~/pkgs$ tar tzvf bzr_2.1.1-1.debian.tar.gz
...
kragen@VOSTRO9:~/pkgs$ tar xzf bzr_2.1.1.orig.tar.gz
kragen@VOSTRO9:~/pkgs$ cd bzr-2.1.1/
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ tar xzf ../bzr_2.1.1-1.debian.tar.gz
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ ./debian/rules build
./debian/rules:5: /usr/share/cdbs/1/rules/debhelper.mk: No such file or directory
./debian/rules:6: /usr/share/cdbs/1/class/python-distutils.mk: No such file or directory
make: *** No rule to make target `/usr/share/cdbs/1/class/python-distutils.mk'. Stop.
Argh. I don’t have the basic Debian package building utilities
installed.
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude install debhelper
(doesn’t do anything)
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo apt-get build-dep bzr
...
After this operation, 43.5MB of additional disk space will be used.
Do you want to continue [Y/n]?
Sigh. Yes, but I need to delete some stuff. This netbook comes with
only 8GiB of space in total. I didn’t really need those home
movies, or that amateur movie about contact improv I downloaded
from YouTube, or the trailer for the DJ Hero video game.
4. 00:14: apt-get has finally finished.
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ ./debian/rules build
...
Now I get to wait for GCC to compile massive amounts of bzr from
source. I thought bzr was written in Python!
5. 00:38: at some point in the last few minutes, the build finished;
now to build packages:
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ fakeroot ./debian/rules binary
...
6. 00:50: looks like that finished at 00:45. Now to install the
freshly-built packages.
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo dpkg -i \
../bzr-doc_2.1.1-1_all.deb ../bzr_2.1.1-1_i386.deb
[sudo] password for kragen:
...
bzr-doc conflicts with bzr (<< 2.0.1)
...
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude remove bzr
...
Need to get 0B of archives. After unpacking 17.9MB will be freed.
Do you want to continue? [Y/n/?]
...
Current status: 0 broken [-2].
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo dpkg -i \
../bzr-doc_2.1.1-1_all.deb ../bzr_2.1.1-1_i386.deb
...
bzr depends on python-configobj; however:
Package python-configobj is not installed.
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ sudo aptitude install python-configobj
...
Setting up bzr (2.1.1-1) ...
...
7. 00:59: Okay, that finally finished. Now trying again to pull their
source:
kragen@VOSTRO9:~/pkgs/bzr-2.1.1$ popd
~/devel ~/devel/peg-bootstrap
kragen@VOSTRO9:~/devel$ bzr branch \
http://bazaar.launchpad.net/~isagalaev/+junk/highlight
Branched 379 revision(s).
kragen@VOSTRO9:~/devel$ cd highlight/
kragen@VOSTRO9:~/devel/highlight$
I have it at last!
8. 01:05: Now what? How do I use it? `src/readme.eng.txt` is the
documentation, and it suggests a few lines of JS:
> Downloaded package includes file "highlight.pack.js" ...
>
Naturally, though, the source checkout doesn’t contain that file,
even though the downloaded package does. The build script seems to
be `tools/build.py`...
9. 01:12: Trying to build the packed JS file:
kragen@VOSTRO9:~/devel/highlight$ tools/build.py
bash: tools/build.py: Permission denied
kragen@VOSTRO9:~/devel/highlight$ python !!
python tools/build.py
...
IOError: [Errno 2] No such file or directory: '/home/kragen/devel/highlight/packed/profile.js'
kragen@VOSTRO9:~/devel/highlight$ python tools/pack.py
Building highlight.js
Building profile.js
That looks promising. Takes a while, though.
`top` reveals that I have five or six runaway `gtk-gnash` processes
in my browser. That explains why everything’s been going so slowly
for the last couple of hours! `killall -9 gtk-gnash` makes things
go a bit faster. I’m an idiot.
...
Building lua.js
Building python.js
kragen@VOSTRO9:~/devel/highlight$ python tools/build.py
kragen@VOSTRO9:~/devel/highlight$ find -name '*.pack.js'
./src/highlight.pack.js
kragen@VOSTRO9:~/devel/highlight$
10. 01:18: Trying it out with this source:
This appears to replace the entire body of my document with the
first `` block, which Highlight.js incorrectly guessed was
Ruby (it was awk, which it doesn’t claim to support). It also
seems to be making a three-deep tree of `` tags,
i.e. `code>code>code`. And it doesn’t seem to come with an
existing stylesheet.
Probably what’s going on is that it’s replacing the parent node
of the specified node on the assumption that it's a `pre>code` kind
of case. And indeed wrapping the `` tag in an extra ``
solves the problem:
It’s a little slow at startup; highlighting around 320 lines of
code takes about 15 seconds on this 1.6GHz Atom netbook.
But there's still no actual highlighting because Highlight.js
merely applied some CSS classes; it didn’t supply a stylesheet. Or
rather, it supplies 13 of them, but doesn’t add a link to one by
default. It does actually mention this, but the documentation
forgot to mention it in the “Installation and usage” section.
11. 01:34: Adding a link to a stylesheet.
Oh my. That looks pretty nice. And even the misidentified “Ruby”
works reasonably well, although a few of the other
misidentifications don’t. But yeah, about 15 seconds for 320 lines
of code.
In practice the autodetect works very poorly on Lua. It commonly
misidentifies it as JS, TeX, or Ruby; the Ruby highlighting works
reasonably well, but the others don’t. My build script in sh,
beginning with a `#!/bin/sh`, got misidentified as AVR assembly.
12. 01:48: My conclusions on Highlight.js:
* Its language autodetect doesn’t work that well.
* It’s slow: about 20 lines of code per second in my old FF3.0 on a
1.6GHz Atom.
* The latest development version is somewhat buggy (at least, it
behaved in an unexpected way that made my entire document fail
to render), although I was able to work around this.
* It does work and produces good results when it finally finishes
running, if it guesses the right language.
Trying Google Code Prettify
---------------------------
1. 01:52: Downloading:
on
there is a link
to .
kragen@VOSTRO9:~/pkgs$ unzip -v prettify-3-Dec-2009.zip
...
5118 Defl:N 1897 63% 12-03-09 11:04 785f9c6d CHANGES.html
...
kragen@VOSTRO9:~/pkgs$ mkdir prettify
kragen@VOSTRO9:~/pkgs$ cd prettify/
kragen@VOSTRO9:~/pkgs/prettify$ unzip ../prettify-3-Dec-2009.zip
2. Adding this code from `README.html` to my document:
3. 01:58: That doesn’t work because I don’t have `class="prettyprint"`
on my `` tags. It must be possible to work around that
somehow, but there doesn’t seem to be documentation in the source
package. I guess I can do an analogous thing to what I did with
Highlight.js:
(function() {
var pres = document.getElementsByTagName('pre');
for (var ii = 0; ii != pres.length; ii++) {
pres[ii].setAttribute('class', 'prettyprint');
}
prettyPrint();
})()
4. 02:06: Well, that did make it add classes, but I screwed up the URL
on the stylesheet link, so no highlighting. Fixed:
5. 02:09: Got highlighting. Looks like the autodetect is failing
miserably. I can’t tell what it thinks all this Lua is, but
whatever it is, it’s not highlighting the Lua comments uniformly as
comments. And it highlights the word “new” in my shell script as
if it were a keyword. Changing to this doesn’t help:
pres[ii].setAttribute('class', 'prettyprint lang-lua');
One big advantage this has over Highlight.lua is that it highlights
asynchronously instead of blocking page load. It also seems to be
about 5 times faster, finishing about 3 seconds after page load
instead of 15.
6. 02:18: My conclusions with Google Code Prettify:
* Fast.
* Easy to use.
* Produces poor results.
* Only one theme.
Still to check out
------------------
I still need to check out SyntaxHighlighter and jQuery.Syntax; they
both sound reasonably promising despite the lack of autodetection.
(Maybe I’ll have to write an autodetection library to use with
whatever syntax-highlighting library I end up using.) But I’m not
going to do it tonight.
Fixing SHJS is appearing more tempting, too, after spending two-plus
hours struggling with Highlight.js.
----
2010-07-09...10, Kragen Javier Sitaker, Buenos Aires.