A quick-and-dirty comparison of skip lists, tries, and other ordered containers

A friend of mine asked:

Is a red-black tree still my best option for a sorted map? Or is there something simpler to implement with similar performance characteristics?

I was pretty sure that a red-black tree was not a reasonable answer to this question, so I thought I’d quickly code up a skip list in C. Then, in the middle, while I was getting hung up on the insertion algorithm, he suggested using a trie, so I coded that up too. And then I wrote a really simple version in a few lines of Python to be sure the C implementations were producing the right results.

The task I’m doing with them is the traditional count-occurrences-of-each-word task, with an eight-column field for the occurrences of each word in the output file:

LANG=C tr -s '\t \n\r' '\n' | LANG=C sort | LANG=C uniq -c | sed 's/^/ /'
Here are my results for bible-pg10.txt so far. Allocation information is from Valgrind. I’m on a quad-core “Intel(R) Pentium(R) CPU N3700 @ 1.60GHz” running "Linux 4.4.0-21-generic #37-Ubuntu", gcc version “5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)”, optimizing with -O5 -m32, using the “OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)”, and Python 2.7.12.
Data structure Lines
of code
Bytes of RAM
allocated (i386)
Blocks of RAM
allocated
Milliseconds (i386,
range of 3 runs)
Skip list (C) 115 961 312 68 118 833–846
Nybble trie (C) 90 7 468 608 109 714 331–332
Python reference
implementation
(hash
table, sorted array)
(Python) 11 28 603 178 140 691 822–827
Text file
(above shell pipeline)
(sh) 1 15 042 752
(amd64, not i386)
35 556 856–882
(amd64)
Java TreeMap
(red-black tree from library)
(Java) 26 ?? ?? 2640–2770
(amd64)

LoC figures are generated using David A. Wheeler’s ‘SLOCCount’.

I expected the trie’s profligate memory usage (and Python’s profligate interpretation and data structure overhead) to put them far out of the skip list’s league in performance, but the trie was actually faster. As expected, the red-black tree was several times slower than the trie or skip list, but much of that is likely attributable to Java’s usual performance problems.

A relevant difference that the table above submerges somewhat is that the trie, skip list, and red-black tree are actually ordered containers, while the Python and shell approaches are not; the ordered containers support arbitrary sequences of insertions and sorted traversals without losing efficiency, while the Python and shell structures do all the sorting at the end.

I wasn’t able to run OpenJDK with HotSpot under Valgrind (it segfaulted), so I couldn’t come up with allocation numbers for it. time(1) says its maximum resident set size was about 120 megabytes.

Still to do: a red-black tree itself (a monomorphic implementation in C rather than using the polymorphic OpenJDK implementation), a binary array set.