Smolhershey: scalable font support in 400 bytes using Hershey fonts

Smolhershey is a 2-clause BSD-licensed C library for vector font rendering. It is suitable for small microcontrollers; it does no heap allocation and is under 400 bytes compiled, and the fonts are only a few kilobytes each. You can download it from smolhershey-1.0.tar.gz (57KB).

It can render adequate scalable vector fonts for English and Russian text and mathematics at a cost about 2–3 orders of magnitude lower than FreeType. It’s even smaller than Kamal Mostafa’s excellent Hershey font library, which is itself very small. It’s suitable for even relatively small microcontrollers, though not the smallest (>4KiB of ROM, >256 bytes of RAM, von Neumann architecture unless you want to hack it). However, so far I’ve only run it on my laptop (sometimes emulating an ARM or RISC-V).

it is very smol

Normally, to render a scalable font, we use outline fonts interpreted with FreeType 2 (785K text size) and a TrueType font such as lmroman12-regular.otf (110K) or Noto Serif (15869K). This exposes a lot of complexity. The FreeType 2 API is 4470 lines of code in 54 header files; libfreetype.so exports 217 entry points. The FreeType 2.12.1+dfsg source base is 170 kloc. It has had 109 security holes discovered since 02006 (or 92 according to SecurityScorecard), though only one in the last year. These staggering costs rule it out for many computing applications; many of them shamble along with ugly bitmap fonts as a result.

The text size of Smolhershey itself is under 1K, rather than 785K, and compiled with size optimization for Cortex-M4, it’s under 200 bytes. The Hershey fonts it uses average 7K per font, rather than 110K or 16 megs, and the Optima-like font is under 4K. Its API is 20 lines of code in one header file. It’s 57 lines of code instead of 170000 lines of code. It might have some security holes in it, but probably not more than two or three.

Here’s the code size compiled for different architectures with -Os, with GCC 12.2.0 unless otherwise specified:

BytesInstructionsCPU
18874Cortex-M4
25079RV64G (with -msave-restore)
29674Cortex-A53 (without -mthumb)
38185AMD64
474237AVR (with avr-gcc 5.4.0)

There are smaller TrueType (and other) outline font engines than FreeType; Vidar Hokstad’s Skrift is only 600 lines of Ruby, and Thomas Oltmann’s libschrift, which Skrift is based on, is only 1500 lines of C. But Smolhershey is still an order of magnitude cheaper even than these.

Since I wrote Smolhershey the other morning, I have used it to produce ASCII art, HP-GL plotter art, and PostScript, and to visualize a vector font from the IMLAC PDS-1D from the 01970s (a font which is 2.5K in Hershey form). So far I’m impressed with how good the results have been with how little effort on my part, though there are some problems.

lineswordsbytesslocfilename
82521310020smolhershey.h, the API
65414235037smolhershey.c, the implementation
141460319395smolhersheyexample.c, the ASCII art example
150636447092smolhersheyhpgl.c, the HP-GL plotter art example
2512787417smolhersheymin.c, the super-minimal PostScript example
120373284985smolhersheyspecimen.c, a utility to produce font specimens in PostScript

Background on Hershey fonts

The public-domain Hershey fonts were created by Dr. Allen Vincent Hershey in 01967, after some eight years of work using some of the most powerful computers in the world, such as the NORC and Stretch, in order to make his math papers look better. Chris Lott at Hackaday claims that he was originally plotting them with the period (.) character of the Stromberg-Carlson SC4020 Charactron microfilm optical printer installed on the NORC, but Lott may not be a reliable source; he claims Hershey was using James Hurt’s file format (see below).

There are several books about Hershey fonts, including Wolcott and Hilsenrath’s “Contribution to Computer Typesetting Techniques” in 01976, Patrick Michael Doyle’s master’s thesis in 01977, and David MacMillan’s “Exploring Dr. Hershey’s Typography” in 02003–02006. For years, the US National Technical Information Service provided copies of the fonts on demand for a nominal copying cost, but requested no further redistribution in the same format.

In 01986, an ad-hoc group known as the Usenet Font Consortium republished most of the Hershey fonts in a new James Hurt Format, or JHF. Most copies of the Hershey fonts in use today derive from their work.

Since the 01970s, Hershey fonts have been largely supplanted by alternative approaches to computerized typography which can produce higher-quality results. The most powerful computers in the world in the early 01960s could only manage about a million instructions per second and only had a few hundred K of RAM and a few megs of disk, and programming was either assembly or Fortran, so Hershey spent enormous effort to construct fonts that could provide high-quality results with what we now consider minimal computing resources. This required compromises to visual quality.

Still, Hershey fonts are still supported today in Inkscape, plotutils, R, VMD, GrADS, IDL, etc.

Stroke fonts

The Hershey fonts are stroke fonts rather than outline fonts; rather than defining an area to fill by describing its boundary, they describe how to move a pen to draw a shape. Some of them describe letterforms with thin single-stroke lines and thicker lines built up with multiple parallel pen strokes, while others do not.

Being stroke fonts gives them a real advantage in compactness, though less so for these multi-pass strokes; these four relatively elaborate glyphs are drawn with 189 coordinate pairs describing 135 lines. Each coordinate is encoded in a single byte.

Aside from compactness of representation, the single-thickness fonts have advantages in flexibility that outline fonts lack: you can distort their geometry (for example to oblique or condense the font) without changing their stroke weight, you can adjust the stroke weight at will, and you can even vary the stroke weight by stroke, for example depending on the stroke angle.

For the above ampersands, I’ve just rerendered a single HP-GL output file several times with hp2xx with different arguments for -p, plotter pen width. We can see that the graphical output quality of hp2xx itself leaves something to be desired with respect to line joins.

A simplified example: generating PostScript output

Here’s an example of using Smolhershey in C without any error checking, omitting everything that isn’t essential to making it run; this shows that in 17 lines of code you can get high-quality graphical output from Smolhershey, if you don’t mind missing fonts being reported via segfaults.

// This will segfault if run on a system where the font is not
// installed.

#include <stdio.h>
#include "smolhershey.h"

// This is the callback that Smolhershey will invoke for each line
// that needs to be drawn.
void draw_postscript_line(sh_point start, sh_point end, void *userdata)
{
  // These PostScript commands add a new line segment to the current
  // path, which will be drawn when `stroke` is invoked at the end of
  // the file.
  printf("%d %d moveto  %d %d lineto\n", start.x, start.y, end.x, end.y);
}

int main()
{
  // Stack-allocate an 8-kilobyte buffer for the font file contents
  // and a pointer array for use in rendering the font.  Smolhershey
  // leaves all memory applications up to the caller; it never
  // allocates anything dynamically itself.
  u8 buf[8192], *glyph_pointers[97]; // One extra for EOF

  // This is the other data structure the caller needs to allocate.
  sh_font my_font = { .lines = glyph_pointers, .n = 97 };

  // Read in the font with stdio.  Since it’s plain ASCII text, "rb"
  // would be undesirable.  `fread` returns the number of items read,
  // which tells `sh_load_font` how big the file is.
  FILE *f = fopen("/usr/share/hershey-fonts/timesr.jhf", "r");
  sh_load_font(&my_font, buf, fread(buf, 1, sizeof buf, f));

  // This graphics context specifies what font to use, what function
  // to call for each line, and what the current point is.  It’s
  // important to create it with an initializer `= { ... }` to ensure
  // that the current point is zero-initialized.
  sh_gc gc = { .font = &my_font, .draw_line = draw_postscript_line };

  // This is PostScript code to move the origin onto the page, flip
  // the Y-axis (because Hershey coordinates increase downwards), and
  // make line ends rounded and lines thick.
  puts("%!\n100 400 translate  1 -1 scale  1 setlinecap  1.5 setlinewidth");

  // As long as you don’t change gc.cp, the characters from `sh_show`
  // get displayed one after the other.
  for (char *p = "hello, world"; *p; p++) sh_show(&gc, *p - ' ');

  // Finally, emit the PostScript commands to draw the built-up path
  // and end the page.
  puts("stroke showpage");
}

If you bundle the font into your executable, as with C23 #embed (reference), you can avoid opening and reading files at runtime. In earlier C standards, there isn’t a way to include an external file at build time (though some hack was pretty much always possible), but the JHF format used by Smolhershey is text, so it’s relatively easy to copy and paste the font file into your source code as a string; you only have to encode the newlines.

API reference

Smolhershey exports two functions, sh_load_font, which loads a Hershey font, and sh_show, which renders a glyph and advances the current point. Because it doesn’t do any allocation itself, it also exports three struct types, sh_font, sh_point, and sh_gc.

sh_font

When you load a font file, Smolhershey builds a rapid-access index in memory pointed to by an sh_font; this involves reading the entire font file and putting a pointer to the line of text for each glyph into an array:

typedef struct { u8 **lines; int n; } sh_font;

The sh_font is passed by reference to sh_load_font and to sh_show.

sh_load_font

You must initialize lines to point to a writable array of pointers before passing the sh_font it to sh_load_font, and you must initialize its n to say how many pointers can be safely written. The return value from sh_load_font tells you how many glyphs were found in the font:

int sh_load_font(sh_font *f, u8 *buf, int n);

sh_load_font also reduces the n in the sh_font to say how many pointers were successfully stored, which is either its original value or the return value of sh_load_font, whichever is smaller.

You can use this information in at least three ways:

  1. You can allocate the right amount of space for lines, as is done in the example code above, because you know how big the font you’re loading is. In that case the return value of sh_load_font only tells you if there’s some kind of super crazy error where invalid data was passed. This is difficult to do for fonts chosen at run time or modified after your program is built.
  2. You can set n to 0 so that no pointers will be written; in this case the return value tells you how much space you need to allocate so that you can call sh_load_font a second time and successfully load the font.
  3. You can point lines to a very large buffer and set n to its size (in pointers); in this case, when sh_load_font returns, you know how much of the buffer it needed, and you can safely allocate the rest of it to other purposes, such as loading another font.

sh_point

sh_point is just an (x, y) pair of integers:

typedef struct { int x, y; } sh_point;

This is used for keeping track of the current point and also to provide line endpoints to your draw_line callback.

sh_show and sh_gc

The sh_show function is takes a sh_gc pointer parameter:

void sh_show(sh_gc *gc, unsigned glyph_index);

This is the graphics context which specifies in what font the character will be drawn, at what current point, and how to draw the character:

// Graphics context.  Can be safely copied and mutated.
typedef struct {
  sh_font *font;
  sh_point cp;
  void (*draw_line)(sh_point start, sh_point end, void *userdata);
  void *userdata;
} sh_gc;

sh_show consults the font to find the requested glyph, invokes draw_line zero or more times to draw the glyph starting at the position cp, and updates cp to be the position after the character so that multiple successive calls to sh_show will draw a whole line of text.

You can point font at different sh_font objects to display text in different styles on the same line, and you can change cp to display text in different positions.

For a given glyph index, sh_show always advances cp by the same amount; it does not do, for example, any kerning.

userdata

When draw_line is invoked, its third argument is the userdata value from the graphics context, which is a workaround for C’s lack of closures. In simple cases, this is unnecessary and can be ignored, and any communication of parameters or drawing state to draw_line other than the start and end point can be accomplished with global, static, or thread-local variables; in hairier cases, you might have multiple graphics contexts active concurrently in the same thread, each of which has its own userdata.

Build process

This should be super simple and take under a second. There are no dependencies beyond the C standard library, and it’s all standard ANSI C99 without VLAs, so even Microsoft's crippled C compiler ought to be able to handle it. Here's what it looks like (my shell prompt ends with ;):

: Downloads; tar xf smolhershey-1.0.tar.gz
: Downloads; cd smolhershey-1.0/
: smolhershey-1.0; make
cc    -c -o smolhersheyexample.o smolhersheyexample.c
cc    -c -o smolhershey.o smolhershey.c
cc  smolhersheyexample.o smolhershey.o -o smolhersheyexample
cc    -c -o smolhersheyhpgl.o smolhersheyhpgl.c
cc  smolhersheyhpgl.o smolhershey.o -o smolhersheyhpgl -lm
cc    -c -o smolhersheymin.o smolhersheymin.c
cc  smolhersheymin.o smolhershey.o -o smolhersheymin -lm
cc    -c -o smolhersheyspecimen.o smolhersheyspecimen.c
cc  smolhersheyspecimen.o smolhershey.o -o smolhersheyspecimen -lm
: smolhershey-1.0; ./smolhersheyexample
  ########     ########                      #####      #####
     ##           ##                            ##         ##
     ##           ##                            ##         ##
     ##           ##                            ##         ##
     ##           ##                            ##         ##
     ##           ##                            ##         ##
     ##           ##                            ##         ##
     ##           ##            ######          ##         ##
     ##           ##          ###     #         ##         ##
     ##           ##         ##       ##        ##         ##
     ###############        ##         #        ##         ##
     ##           ##        ##         ##       ##         ##

(Example output is truncated vertically and horizontally.)

It won’t build with BCC (Bruce’s C Compiler) because it’s not K&R, and BCC only supports K&R C. It won’t build with SDCC 4.2.0 (Sandeep Dutta’s Small Device C Compiler) because SDCC doesn’t support passing structs by value or compound literals. (The SDCC User’s Guide says it does support passing structs by value, except on PIC14 and PIC16, but I haven’t been able to convince it.) It won’t build with cc65 because cc65 doesn’t support C99.

Where to get Hershey fonts

On Debian or related distributions you should install Mostafa’s data to get most of Hershey’s original fonts in a format Smolhershey can use in /usr/share/hershey-fonts:

sudo apt install hershey-fonts-data

I think these are derived from versions in GNU plotutils.

Also, I’ve included a fixed-pitch vector font in JHF format for ASCII in imlac-pds-1-ssvchr.22.jhf, which is 2.5K. This is smaller than any actual Hershey font. It’s the font from Scroll Saver, a program for the IMLAC PDS-1D “graphical minicomputer” (programmable vector-graphics terminal). Its font program is preserved by Tom Uban, extracted from the copy from his IMLAC, at http://www.ubanproductions.com/Imlac/ssv, and has been disassembled by the ITS preservationists. Using IMLAC documentation archived by Bitsavers, I wrote a stupid Python program to extract the vector paths from the IMLAC program (which constructs them incrementally, according to the limitations of the PDS-1 hardware) and produce this JHF file.

(font sample from PDS-1 font)

David MacMillan has put the original Usenet Font Consortium shar files of the Hershey fonts in JHF format on the web; Dener Rosa Silva’s Hershey-TTF project has another copy of the files. Paul Bourke’s Hershey fonts page also includes occidental and Japanese JHF files. All of these use non-ASCII glyph numberings, and it seems that Smolhershey is having trouble rendering the Japanese font; I get a specimen sheet of character components rather than entire characters, quite different from Bourke’s own visualization.

What’s the catch? What’s missing?

In two or three hundred bytes, there’s a limit to how much functionality is included. In Smolhershey, the desirable things omitted include the following:

Drawing lines. Hershey fonts are made of lines, but Smolhershey doesn’t contain any code to draw lines. It doesn’t even have a concept of a pixel. Instead, Smolhershey invokes your draw_line callback; your application needs to provide the code to draw a line, which you can do in whatever way you like. (In the PostScript example above, the line is drawn by outputting a PostScript command to draw it.) At this scale, this is not an insignificant omission! Optimized for size on amd64, Smolhershey is 381 bytes of code, while the Bresenham line-drawing subroutine in smolhersheyexample.c is 179 bytes. But many programs that produce graphical output already have some way to draw a line at an arbitrary angle.

Reading TrueType fonts. Unless you spend some major quality time with some graph paper or write a Hershey-font editor, you’re pretty much stuck with the 33 fonts Hershey digitized 57 years ago. You could write a program to generate optimized Hershey-font versions of TrueType fonts, but so far nobody has.

Curves. Hershey approximated curves with a number of line segments with obtuse angles between them. I’ve been thinking about trying to automatically turn these into Bézier curves, and that is definitely a thing you can do in your draw_line subroutine without changing Smolhershey itself, but that functionality doesn’t exist yet.

Stroke thickness information. It’s great that you can change the thickness of the strokes, but it would be nice if the font files contained some sort of base stroke width information about what a normal stroke width would be. This often results in people drawing them as thin as they possibly can, which is rarely aesthetically beneficial, and often exacerbates the lack of curves. Of course, your draw_line callback can use whatever thickness it wants, even a variable thickness.

Combining characters, and therefore support for languages like Spanish, Albanian, French, German, Norwegian, Polish, Finnish, Danish, Romanian, Hungarian, and Portuguese, which would otherwise be well-supported. This could be added with a few lines of code, but then you’d have to add the combining characters to the fonts (which I think was done previously by c’t in Germany).

Unicode. There have been efforts to map out the relationships between Unicode code points and the Hershey glyphs, but Smolhershey does not include them. And there are only a few thousand Hershey glyphs in all.

Kerning. There is no kerning information, so sometimes you get ugly keming; the “Wo” in the introductory cursive graphic and the “W&” in the illustration of multiple parallel strokes are good examples of this. Your application code can apply kerning tables, but none are supplied, and this is not done automatically.

Colors. Emoji need colors. On the other hand, the Hershey fonts don’t include any emoji, even in stroke form.

Vertical escapement. If you’re doing CJKV work, you may want to lay out characters in columns rather than lines, but although the Hershey fonts do include some Japanese characters (all the standard kana but not even all the Joyo kanji), they do not include vertical spacing information.

RTL. For Hebrew, it would be straightforward to define Hershey glyphs with negative horizontal escapement, but they wouldn’t combine properly with LTR characters like Latin letters.

Ligatures. No ligatures, so no hope of supporting Arabic or Devanagari.

Graphical effects. Though one of the advantages I cited above for stroke fonts is that you can do things like scale them, automatically make a compressed version, or add calligraphic emphasis at some angle, Smolhershey won’t do this for you; you have to do it yourself in your draw_line function.

License: 2-clause BSD

Copyright 02024 Kragen Javier Sitaker

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.