How you should set up a full-disk-encryption passphrase on a laptop

With these recommendations, a thief who steals your laptop while it is hibernated will find it infeasible to recover your credit card numbers, Facebook login cookies, photos of you or your intimate partner naked, and journalistic sources, without expending some US$1.9 trillion or persuading you to reveal the passphrase. The cost, once it's set up, is that it will take an extra five seconds or so every time you boot or unhibernate.

Basic recommendations:

Use Linux.
Use dm-crypt with LUKS.
Encrypt your swap too.
Use cryptsetup luksAddKey -i 1500 instead of the default to increase the level of security.
Use a randomly-generated passphrase with 72 bits of entropy, using 6 words chosen with true randomness from the 4096 most common English words of 5 letters or less. (A passphrase-generation script is provided.) A multiword passphrase allows you to delete a mistyped word at boot time with control-W.
Instead of using ACPI suspend and hibernate, use swsusp, as Debian does by default, so that your hibernated laptop has no unencrypted data and requires the passphrase to unhibernate.
Practice unhibernating using your new passphrase three times during the first hour and three more times during the first day so that you don't forget it.
Once your new passphrase is well memorized, remove the original one with cryptsetup luksKillSlot.

The rest of this document describes this in more detail.

Initial disk encryption setup

I have an unencrypted sda5 boot partition; the rest of the disk, sda6, is a LUKS dm-crypt container containing an LVM physical volume, which contains my root and swap partitions. Details follow.

This was kind of a bear to set up with the Debian installer, and I'm not really sure why, but here's my setup. First, the disk partitions:

$ sudo fdisk -l /dev/sda

Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa42d04a3

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63       80324       40131   de  Dell Utility
/dev/sda2           81918   312580095   156249089    5  Extended
/dev/sda5           81920      276479       97280   83  Linux
/dev/sda6          278528   312580095   156150784   83  Linux

$ df # abridged output
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/my-root  144085696 134301220   2465336  99% /
/dev/sda5                94195     48286     41045  55% /boot

"Dell Utility" is a 40-megabyte partition I'm afraid I might need someday; it's Dell's substitute for an install CD. sda2 is an extended partition, for no good reason, including two logical partitions, sda5 and sda6. sda5 is the boot partition, which has to be unencrypted so GRUB can read the kernel from it. Debian is using about 40MB of its 100MB. sda6 is a dm-crypt partition with LUKS; its encrypted contents are an LVM physical volume, containing swap and the root filesystem. This configuration looks like the following:

$ sudo dmsetup ls
my-swap (254:1)
my-root (254:2)
sda6_crypt  (254:0)
$ ls -l /dev/mapper
total 0
crw------T 1 root root 10, 236 Jul  3 21:56 control
lrwxrwxrwx 1 root root       7 Jul  3 21:56 sda6_crypt -> ../dm-0
lrwxrwxrwx 1 root root       7 Jul  3 21:56 my-root -> ../dm-2
lrwxrwxrwx 1 root root       7 Jul  3 21:56 my-swap -> ../dm-1
$ sudo cryptsetup status sda6_crypt
/dev/mapper/sda6_crypt is active and is in use.
  type:    LUKS1
  cipher:  aes-xts-plain64
  keysize: 512 bits
  device:  /dev/sda6
  offset:  4096 sectors
  size:    312297472 sectors
  mode:    read/write
$ sudo pvs --segments
  PV         VG   Fmt  Attr PSize   PFree Start SSize
  /dev/dm-0  vga  lvm2 a--  148.91g    0      0  2384
  /dev/dm-0  vga  lvm2 a--  148.91g    0   2384 35738
$ sudo vgs
  VG   #PV #LV #SN Attr   VSize   VFree
  my     1   2   0 wz--n- 148.91g    0 
$ sudo lvs
  LV   VG   Attr     LSize   Pool Origin Data%  Move Log Copy%  Convert
  root my   -wi-ao-- 139.60g                                           
  swap my   -wi-ao--   9.31g                                           
$ cat /etc/fstab # abridged
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/my-root /               ext4    errors=remount-ro 0       1
# /boot was on /dev/sda5 during installation
UUID=f8619cb6-bc69-4758-aab6-5d4e961b1c7c /boot           ext3    defaults        0       2
/dev/mapper/my-swap none            swap    sw              0       0

I don't know how to show that the root and swap logical volumes are stored in the two segments inside the physical volume, but they are.

Choosing and setting a new passphrase

So I'm going to change the password for my disk encryption. I'm using dm-crypt with LUKS, which is what Debian's installer gives you, what Fedora uses since Fedora 9, and probably the right thing to use in general.

How LUKS works

LUKS works by using a "master key" consisting of truly-random bits, which is then encrypted using one or more passphrases. The passphrase is used to derive a key using PBKDF2, that key is used to encrypt the master key, and the encrypted master key and other parameters are written to the disk. To try a passphrase, as I understand it, the passphrase and parameters are used to derive a key, the key is used to decrypt the encrypted master key, and the encrypted master key is checked against a stored master-key hash.

A six-short-word password is practical to memorize and plenty strong.

It is reasonable to memorize a phrase of six 5-letter-or-less words chosen from among the 4096 most common words in English. They provide 72 bits of entropy, and LUKS can provide 16 bits of key stretching, for a total of 2⁸⁸ PBKDF2 iterations to exhaust the entire keyspace. A brute-force attack then would require a computational expense currently equal to half a century of world economic output.

The needed passphrase length depends PBKDF2 iteration count.

This is adjustable.

$ sudo /sbin/cryptsetup luksDump /dev/sda6
LUKS header information for /dev/sda6

Version:        1
Cipher name:    aes
Cipher mode:    xts-plain64
Hash spec:      sha1
Payload offset: 4096
MK bits:        512
MK digest:      (omitted)
MK salt:        (omitted)
MK iterations:  8625
UUID:           982a6438-17bc-4834-a1cc-063bc41d4008

Key Slot 0: ENABLED
        Iterations:             34801
        Salt:                   (omitted)
        Key material offset:    8
        AF stripes:             4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED

So PBKDF2 was being iterated 34801 times on my disk by default.

How fast is PBKDF2? 4 billion iterations per second per GPU

http://stackoverflow.com/questions/11298184/about-how-fast-can-you-brute-force-pbkdf2 says, in early 2013:

one popular GPU based cracking program can handle with a kitted out modern desktop + 8 GPU's at 1 million tries a second against WPA2. As WPA2 is essentially PBKDF2(HMAC−SHA1, passphrase, ssid, 4096, 256), that tells us that one machine can test somewhat over 4 billion HMAC-SHA1 PBKDF2 iterations per second.

So one second on a GPU cracking machine is about 2³² iterations, and LUKS chose about 2¹⁶ iterations for my disk. It's possible LUKS was erring a bit on the side of fast booting at the expense of security.

You can probably afford about 2¹⁶ iterations, no problem.

Since my machine typically takes more than 30 seconds to boot, I thought I could probably afford 8 seconds of key hashing; let's see what that works out to:

$ sudo /sbin/cryptsetup luksAddKey -i 8000 /dev/sda6
Enter any passphrase: 
Enter new passphrase for key slot: 
Verify passphrase: 
$ sudo /sbin/cryptsetup luksDump /dev/sda6
...
Key Slot 1: ENABLED
        Iterations:             282179

Yes, it looks like it was being too fast. So that gives us about 2¹⁸ iterations.

After some months of experience with this, I conclude that -i 8000 is too long, because it slows down the process of learning the passphrase. -i 1500, taking 1½ seconds instead of 8 to decrypt your disk, loses almost 3 bits of key-stretching, but it's still more than adequate, is much more pleasant to use, and makes memorizing the passphrase much faster.

2⁸⁸ PBKDF2 hashes would require 100 years of Earth’s entire economy.

Now we have the question of how much we want cracking the disk to cost. Let’s figure that the aforementioned 8-GPU machine costs about US$2000, or US$1000 per year, and we'd like cracking the disk to require more than the current world GDP for a century. Current world GDP is about US$70 trillion/year, so a century would be US$7 quadrillion.

So that's 7 trillion machine-years, or about 2⁴³ machine-years. A year is 31 556 926 seconds, or about 2²⁵ seconds, so we want cracking to need 2⁶⁸ machine seconds, and at 2³² iterations per second, that's 2¹⁰⁰ iterations of PBKDF2. If you need 2¹⁸ iterations per password, that's 2⁸⁸ passwords, so you need 89 bits of entropy to give you that expected brute-forcing time. 16 of those can come from the key-stretching mentioned above, leaving 73 bits of entropy to come from the passphrase.

What does 89 bits of entropy look like? It looks like this:

$ ./bitwords.py 89
The 89-bit number 394960078400707194168835240 can be represented as:
In binary: 10100011010110100000011001100000010011010010011111000101110010001110110000111110010101000
In hex: 146b 40cc 09a4 f8b9 1d87 ca8
In the first half of the alphabet to avoid cusswords, like Chrome: jfmhmdfdkgehjefhecdfgmee
In lowercase letters and digits: 1dmjvjdsb475cm8aeg
In lowercase letters: nkixlgnajwgpbrekqce
In letters and digits: vQw9Dl8Xru0wonS
In letters, digits, hyphen, and underscore, like YouTube: kqQ3c2qjUKhS7OE
In printable ASCII: )ntDL4#jLsB9dW
In 12-bit words: are rooms important need politics servants judgement personally
In 12-bit 6-letter words: are boss better house pool venue asylum nicely
In 12-bit 5-letter words: are gifts young end draft doug ache tram
In 6-bit words: are they more to on and they have could one by them is who their
In minimal Japanese-like syllables: fisida dehubi nedoda fahegi barubu na
In complicated made-up syllables: bamk fims prul kawst taimt nawtth scral
In the S/Key word list: ABE YAW JAKE OS POE WAYS ONTO CADY HAWK
In ideographs: 丄訛孶诋罙澀齜

That is, it's about 8 or 9 words, or 19 lowercase letters. I want to restrict my passphrase to lowercase letters and possibly spaces, because often I boot my netbook without being able to see the keyboard very well, and punctuation and digits are hard to type accurately.

Sequences of lowercase words are the best encoding for touch-typing.

I did some experiments with 89-bits-of-entropy passphrases. The results apply equally well to the 73-bits-of-entropy passphrases I ended up recommending. Here are the notes from the experiments.

I typed the form "nkixlgnajwgpbrekqce" 8 times in 42 seconds with only one error, or about 5 seconds each; this seems reasonable, although it's harder to memorize.

The form "abe yaw jake os poe ways onto cady hawk" I typed 9 times in 62 seconds, or about 7 seconds each, again with only one error, plus another case where I corrected a word with ^W. I'm not sure I have ^W to erase a word during boot time.

The form "are gifts young end draft doug ache tram" similarly had one error out of 8 times, which took 58 seconds.

Given similar error rates for all three, there's a tradeoff between typing speed and memorizability. The S/Key word list seems to generate less-memorable passphrases:

In the S/Key word list: BALD WHEE TOWN HAIL BAIT HAAS RAVE NEAL
In the S/Key word list: LOP GOOF NESS STEM MORT STIR WICK AVER
In the S/Key word list: ABE CURE ELM CLOT ROSE AHOY RUM EVEN SOW
In the S/Key word list: HATH SAW PAY CRIB LATE OS BARK HUM
In the S/Key word list: ABE MOSS BYE BOYD RIDE AVER USER GUSH DIG
In the S/Key word list: BUNK DAB BITE RUM GANG WENT LIFE SUM
In the S/Key word list: ABE DIRT HOE SNOB FAIL SOCK RUTH SAND CLAN
In the S/Key word list: CUR WHEE NAGY JUJU HARM ACTS QUO ARGO
In the S/Key word list: ARID LIST HOST WHAM LAB HARD HOSE MUM
In the S/Key word list: HOVE SELF MAT ELAN TUSK BAT DEAR COMB

While the 4096 most common words of 5 letters or less are somewhat better, and only longer by about 3 letters on average:

In 12-bit 5-letter words: of fee fees star mice half geoff hosts
In 12-bit 5-letter words: they firm id lump cup chief days gap
In 12-bit 5-letter words: at ledge colin gaps ai ibid fungi seas
In 12-bit 5-letter words: in bum klaus swim towns rump fame mar
In 12-bit 5-letter words: they june giant andy mate scot creek third
In 12-bit 5-letter words: as fig thou tha next fancy jim sid
In 12-bit 5-letter words: they dwarf fife stair josh turks spade dame
In 12-bit 5-letter words: is jan obey lick swell danny onion exact
In 12-bit 5-letter words: that drill some tina mungo hedge noise leak
In 12-bit 5-letter words: you bean jon quiet bingo ricky allow sings

Taking extreme examples, "that drill some Tina Mungo hedge noise leak" is something I can easily visualize a cartoon of, while "Abe moss bye Boyd ride aver user gush dig" is quite a bit more difficult. So it's a tradeoff between 60% more typing time than "nkixlgnajwgpbrekqce" (a 3-second slower bootup) and more easily memorizing "are gifts young end draft doug ache tram" (Are gifts young? End draft! Doug ache — tram.)

In my book, the ease of memorization is worth the extra three seconds.

Eight 12-bit words can actually encode up to 96 bits of entropy. If you cut the password entropy down from 89 bits to 84, you can get by with 7 12-bit words; or at 88 bits, you could use 8 11-bit words:

The 88-bit number 155090174075185364222823520 can be represented as:
...
In 12-bit 5-letter words: was may deed cite sits sec after burke
In 11-bit 5-letter words: lewis born pond bruce leg keeps open wet
...
The 83-bit number 9078977035531963957140619 can be represented as:
...
In 12-bit 5-letter words: grin altar cook alley shaw sixty strap

If I order these three by memorability, I think the order is:

Lewis-born pond. Bruce! Leg keeps open, wet.
Grin altar. Cook Alley Shaw. Sixty strap!
Was. May deed cite? Sits sec after Burke.

So, at the cost of cutting the attack cost by half, we can apparently improve memorability substantially; but reducing it by a factor of 2⁵, doesn't help particularly. If we reduce to 77 bits, a security loss of 2¹² (which might still be an acceptable level of security) perhaps we can do better still on memorability:

The 75-bit number 26185477596619230170512 can be represented as:
In 11-bit 5-letter words: live hook is help doors anna adult
The 77-bit number 114258389520881705891469 can be represented as:
In 11-bit 5-letter words: fancy cat grant backs robin cable iraqi
The 77-bit number 121847319416516085707325 can be represented as:
In 11-bit 5-letter words: acres x roads hunt over valid peace

So that is how I generated my new passphrase.

$ ./bitwords.py 77
(output omitted)
$ sudo /sbin/cryptsetup luksAddKey -i 8000 /dev/sda6
Enter any passphrase: 
Enter new passphrase for key slot: 
Verify passphrase:

Later, I discovered that you can get 12 bits per word while limiting them to five letters with no real loss of memorability.

Remove old LUKS passphrases with `luksKillSlot`

Now to remove the key I added earlier when testing the number of iterations, so it doesn't slow down booting from now on:

$ sudo /sbin/cryptsetup luksKillSlot /dev/sda6 1
Enter any remaining LUKS passphrase: 
$ sudo /sbin/cryptsetup luksDump /dev/sda6
...
Key Slot 0: ENABLED
        Iterations:             34801
        Salt:                   (omitted)
        Key material offset:    8
        AF stripes:             4000
Key Slot 1: DISABLED
Key Slot 2: ENABLED
        Iterations:             285929
        Salt:                   (omitted)
        Key material offset:    1016
        AF stripes:             4000
...

Deriving and testing a practice schedule for memorizing the passphrase.

So now I have my original passphrase and the new one. I hibernated the machine and unhibernated using the new passphrase, both as a test and to practice the new passphrase. I'll keep practicing it throughout the day so as not to forget it; once I've been using it for a couple of days, I can use luksKillSlot to remove the original weak passphrase.

The practice schedule should look something like this:

10:31: added new key.
10:35: first hibernate/unhibernate.
10:37: second hibernate/unhibernate.
10:47: third hibernate/unhibernate.
11:30: fourth hibernate/unhibernate.
around 16:00: fifth hibernate/unhibernate.
Tonight: sixth hibernate/unhibernate.
Tomorrow morning: seventh hibernate/unhibernate.
The day after tomorrow: eighth hibernate/unhibernate.

The intervals here are a bit more conservative than the usual 1, 5, 25, 125, 625, 3125 intervals for practice memorization, for two reasons:

The passphrase, although I impose linguistic associations onto it, doesn't really have an underlying structure that simplifies it; so it's inherently a hard thing to memorize.
If at some point I forget the passphrase, I don't have the opportunity to recover it from some reminder medium, as is usually the case for things I'm trying to memorize. I can't just turn over a flash card. If I forget it once, and it's not still in my terminal scrollback, it's gone.

Oh, and on the third hibernate, I verified that ^W does in fact work to correct a mistyped word. This means that a multiword passphrase is much less error-prone: if you think you may have mistyped a word, you can type ^W and continue from the beginning of the word.

(And on the fourth, I misremembered the passphrase at first, suggesting that the schedule may not be conservative enough. I hope that the shorter six-word passphrase I'm recommending now will be easy enough to memorize that this practice schedule is adequate.)

Initially formatting a disk with LUKS

In the above case, the Debian installer had done the initial cryptsetup and LVM setup for me. Later I set up LUKS on a new USB disk, without LVM, like this:

$ sudo cfdisk /dev/sdc # then interactively create the partition table
$ sudo cryptsetup -i 1000 -v -y luksFormat /dev/sdc1  # prompts passphrase

Here -v is verbose, -y is to prompt for passphrase verification, and -i 1000 is to use 1 second of key hashing. After some months of experience with the 8-second-hashing setup mentioned earlier, I think it's probably a better user experience to use a shorter hashing time, like 500 or 1000 milliseconds, and if necessary a longer passphrase — although I'm still using just an 88-bit passphrase. 8 seconds is fine when your disk mounts correctly, but it's a pain when you mistype the passphrase and don't find out for 8 seconds.

Then:

$ sudo cryptsetup luksOpen /dev/sdc1 usbdisk  # prompts for passphrase
$ sudo mkfs -t ext4 /dev/mapper/usbdisk
$ sudo tune2fs -r 0 /dev/mapper/usbdisk   # eliminate root reserved space
$ sudo mkdir /usbdisk
$ sudo mount /dev/mapper/usbdisk /usbdisk

There is mysteriously 200M occupied in /usbdisk according to df, but I think that's the filesystem's journal.

cryptsetup status shows the cipher as aes-cbc-essiv:sha256 rather than the aes-xts-plain64 I got before, presumably because the cryptsetup defaults changed. I'm assuming, without investigating, that this is an okay default.

Unmounting the disk manually is done like this:

$ sudo umount /usbdisk
$ sudo cryptsetup luksClose usbdisk

It's easier to interactively mount and unmount the disk via an interactive file manager such as XFCE's Thunar, which mounts it in /media/really-long-uuid after devmapping it in /dev/mapper/udisks-luks-uuid-anotherreallylonguuid. I assume the file managers in GNOME and KDE also work.

Possible improvements include ECC passwords, checksums, and GPU dm-crypt

In 1999 I suggested using error-correction coding, ECC, in passphrases. This would still be a good idea, but 14 more years of Moore's Law have made it substantially more important. A 77-bit passphrase today provides a level of security similar to what a 67-bit passphrase provided at the time, but you need to type seven words without an error instead of six.

As an alternative that allowed people to use passwords they chose themselves, you could store some kind of checksum of the password instead, alongside the salt. A properly chosen four-bit checksum would catch at least 15 out of every 16 mistyped passwords immediately, without having to wait eight seconds, at the expense of effectively weakening your password entropy by (almost) four bits. ECC would save you from having to retype your entire passphrase, but the checksum would save you from having to wait a number of seconds to find out whether you got it right or not. (The rapid feedback would speed the process of passphrase memorization.)

Actually using the machine's GPU for the passphrase hashing would speed up the hashing process by an order of magnitude, eliminating the hardware advantage for attackers. This, however, requires having GPU drivers loaded at early boot time.

Passphrase generation program

I posted an earlier version of this program at http://lists.canonical.org/pipermail/kragen-hacks/2012-April/000538.html. The current version is at http://canonical.org/~kragen/sw/netbook-misc-devel/bitwords.py. This version has the S/Key word list removed for brevity.

#!/usr/bin/python
# -*- coding: utf-8 -*-
"Represent numbers as strings of symbols."

import codecs, itertools, math, random, string, sys

def main(argv):
    if len(argv) == 1:
        bits = 80
        number = random.SystemRandom().getrandbits(bits)
    elif len(argv) == 2:
        number = random.SystemRandom().getrandbits(int(argv[1]))
    else:
        number = int(argv[2])

    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

    print "The %d-bit number" % math.ceil(math.log(number)/math.log(2)), number,
    print "can be represented as:"

    print "In binary:",''.join(represent_with(number, '01'))
    print "In hex:",' '.join(''.join(group) for group in groups(represent_with(number, digits + 'abcdef'), 4))
    print "In the first half of the alphabet to avoid cusswords, like Chrome:",
    print ''.join(represent_with(number, 'abcdefghijklm'))
    print "In lowercase letters and digits:",
    print ''.join(represent_with(number, digits + lowercase))
    print "In lowercase letters:",
    print ''.join(represent_with(number, lowercase))
    print "In letters and digits:",
    print ''.join(represent_with(number, digits + lowercase + uppercase))
    print "In letters, digits, hyphen, and underscore, like YouTube:",
    print ''.join(represent_with(number, digits + lowercase + uppercase + '-_'))
    print "In printable ASCII:",
    printable_ascii = [chr(ii) for ii in range(33, 127)]
    print ''.join(represent_with(number, printable_ascii))
    print "In 12-bit words:", ' '.join(represent_with(number, wordlist(4096)))
    print "In 12-bit 6-letter words:", ' '.join(represent_with(number, 
                                                               mlwords(4096,6)))
    print "In 12-bit 5-letter words:", ' '.join(represent_with(number, 
                                                               mlwords(4096,5)))
    print "In 11-bit 5-letter words:", ' '.join(represent_with(number, 
                                                               mlwords(2048,5)))
    print "In 6-bit words:", ' '.join(represent_with(number, wordlist(64)))
    print "In minimal Japanese-like syllables:",
    print ' '.join(''.join(group) 
                   for group in groups(represent_with(number, syllables), 3))
    print "In complicated made-up syllables:",
    print ' '.join(represent_with(number, english))
    print "In ideographs:", ''.join(represent_with(number, ideographs))

digits = string.digits
lowercase = string.ascii_lowercase
uppercase = string.ascii_uppercase

def represent_with(number, digits):
    return [digits[d] for d in represent(number, len(digits))]

def represent(number, base):
    quotient, remainder = divmod(number, base)
    return ([] if quotient == 0 else represent(quotient, base)) + [remainder]

def wordlist(maxwords):
    return list(itertools.islice(words(), maxwords))

def mlwords(maxwords, maxlen):
    return list(itertools.islice((word for word in words() 
                                  if len(word) <= maxlen), maxwords))

def words():
    return (line.split()[1] for line in
    open('/home/user/devel/wordlist')) # http://canonical.org/~kragen/sw/wordlist

def groups(it, n):
    it = iter(it)
    while True:
        thing = list(itertools.islice(it, n))
        if not thing:
            break
        yield thing

 # Here I’m trying to avoid letters whose sounds are easily confused:
 # - no sounds differing only in voicedness and/or aspiration (b
 #   vs. p, t vs. d, k vs. g, s vs. z, f vs. v)
 # - no sounds differing only in fricative vs. stop (b vs. v)
syllables = [C + V
             for C in 'bdfghjrmns'
             for V in 'aeiou']

# Here I exploit as much of English syllable structure as I can manage.
english = [onset + v + coda
           for onset in (list('bdfgjklmnprstvwz') +
                         [P + L for P in 'bcfgp' for L in 'lrw'] +
                         ['s' + P + L for P in ['c', 'p'] for L in 'lrw'] +
                         ['tr', 'tw', 'str']
                         )
           for v in ['a', 'ah', 'ai', 'aw', 'e', 'ee', 'i', 'oa', 'oo', 'u']
           for coda in ([prefinal + final
                         for prefinal in ['m', 'n', 'ng', 'l', 's']
                         for final in 'bdkt'] +
                        [final + postfinal
                         for final in ['ng'] + list('bdfgklmnprt')
                         for postfinal in ['s', 'th', '']]
                        + ['ss', 'z', 'j', ''])
           ]

assert len(english) == len(set(english))

# This needs some work, because it includes a fair number of code
# points with no characters assigned yet, or no glyph in the fonts I
# have installed.
# <http://en.wikipedia.org/wiki/List_of_CJK_Unified_Ideographs,_part_1_of_4>
# has empty squares scattered around apparently at random. :(
ideographs = [unichr(ii) for ii in range(0x4E00, 0x9FA6)]

def ok(a, b): assert a == b, (a, b)
ok(represent(12345, 10), [1, 2, 3, 4, 5])
ok(represent(0, 10), [0])
ok(wordlist(5), ['the', 'of', 'and', 'to', 'a'])

if __name__ == '__main__':
    main(sys.argv)