With these recommendations, a thief who steals your laptop while it is hibernated will find it infeasible to recover your credit card numbers, Facebook login cookies, photos of you or your intimate partner naked, and journalistic sources, without expending some US$1.9 trillion or persuading you to reveal the passphrase. The cost, once it's set up, is that it will take an extra five seconds or so every time you boot or unhibernate.
Basic recommendations:
cryptsetup luksAddKey -i 1500
instead of the default to
increase the level of security.cryptsetup luksKillSlot
.The rest of this document describes this in more detail.
I have an unencrypted sda5
boot partition; the rest of the disk,
sda6
, is a LUKS dm-crypt container containing an LVM physical
volume, which contains my root and swap partitions. Details follow.
This was kind of a bear to set up with the Debian installer, and I'm not really sure why, but here's my setup. First, the disk partitions:
$ sudo fdisk -l /dev/sda
Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders, total 312581808 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa42d04a3
Device Boot Start End Blocks Id System
/dev/sda1 * 63 80324 40131 de Dell Utility
/dev/sda2 81918 312580095 156249089 5 Extended
/dev/sda5 81920 276479 97280 83 Linux
/dev/sda6 278528 312580095 156150784 83 Linux
$ df # abridged output
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/my-root 144085696 134301220 2465336 99% /
/dev/sda5 94195 48286 41045 55% /boot
"Dell Utility" is a 40-megabyte partition I'm afraid I might need
someday; it's Dell's substitute for an install CD. sda2
is an
extended partition, for no good reason, including two logical
partitions, sda5
and sda6
. sda5
is the boot partition, which
has to be unencrypted so GRUB can read the kernel from it. Debian is
using about 40MB of its 100MB. sda6
is a dm-crypt partition with
LUKS; its encrypted contents are an LVM physical volume, containing
swap and the root filesystem. This configuration looks like the
following:
$ sudo dmsetup ls
my-swap (254:1)
my-root (254:2)
sda6_crypt (254:0)
$ ls -l /dev/mapper
total 0
crw------T 1 root root 10, 236 Jul 3 21:56 control
lrwxrwxrwx 1 root root 7 Jul 3 21:56 sda6_crypt -> ../dm-0
lrwxrwxrwx 1 root root 7 Jul 3 21:56 my-root -> ../dm-2
lrwxrwxrwx 1 root root 7 Jul 3 21:56 my-swap -> ../dm-1
$ sudo cryptsetup status sda6_crypt
/dev/mapper/sda6_crypt is active and is in use.
type: LUKS1
cipher: aes-xts-plain64
keysize: 512 bits
device: /dev/sda6
offset: 4096 sectors
size: 312297472 sectors
mode: read/write
$ sudo pvs --segments
PV VG Fmt Attr PSize PFree Start SSize
/dev/dm-0 vga lvm2 a-- 148.91g 0 0 2384
/dev/dm-0 vga lvm2 a-- 148.91g 0 2384 35738
$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
my 1 2 0 wz--n- 148.91g 0
$ sudo lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
root my -wi-ao-- 139.60g
swap my -wi-ao-- 9.31g
$ cat /etc/fstab # abridged
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/mapper/my-root / ext4 errors=remount-ro 0 1
# /boot was on /dev/sda5 during installation
UUID=f8619cb6-bc69-4758-aab6-5d4e961b1c7c /boot ext3 defaults 0 2
/dev/mapper/my-swap none swap sw 0 0
I don't know how to show that the root
and swap
logical volumes
are stored in the two segments inside the physical volume, but they
are.
So I'm going to change the password for my disk encryption. I'm using dm-crypt with LUKS, which is what Debian's installer gives you, what Fedora uses since Fedora 9, and probably the right thing to use in general.
LUKS works by using a "master key" consisting of truly-random bits, which is then encrypted using one or more passphrases. The passphrase is used to derive a key using PBKDF2, that key is used to encrypt the master key, and the encrypted master key and other parameters are written to the disk. To try a passphrase, as I understand it, the passphrase and parameters are used to derive a key, the key is used to decrypt the encrypted master key, and the encrypted master key is checked against a stored master-key hash.
It is reasonable to memorize a phrase of six 5-letter-or-less words chosen from among the 4096 most common words in English. They provide 72 bits of entropy, and LUKS can provide 16 bits of key stretching, for a total of 2⁸⁸ PBKDF2 iterations to exhaust the entire keyspace. A brute-force attack then would require a computational expense currently equal to half a century of world economic output.
This is adjustable.
$ sudo /sbin/cryptsetup luksDump /dev/sda6
LUKS header information for /dev/sda6
Version: 1
Cipher name: aes
Cipher mode: xts-plain64
Hash spec: sha1
Payload offset: 4096
MK bits: 512
MK digest: (omitted)
MK salt: (omitted)
MK iterations: 8625
UUID: 982a6438-17bc-4834-a1cc-063bc41d4008
Key Slot 0: ENABLED
Iterations: 34801
Salt: (omitted)
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: DISABLED
Key Slot 3: DISABLED
Key Slot 4: DISABLED
Key Slot 5: DISABLED
Key Slot 6: DISABLED
Key Slot 7: DISABLED
So PBKDF2 was being iterated 34801 times on my disk by default.
http://stackoverflow.com/questions/11298184/about-how-fast-can-you-brute-force-pbkdf2 says, in early 2013:
one popular GPU based cracking program can handle with a kitted out modern desktop + 8 GPU's at 1 million tries a second against WPA2. As WPA2 is essentially PBKDF2(HMAC−SHA1, passphrase, ssid, 4096, 256), that tells us that one machine can test somewhat over 4 billion HMAC-SHA1 PBKDF2 iterations per second.
So one second on a GPU cracking machine is about 2³² iterations, and LUKS chose about 2¹⁶ iterations for my disk. It's possible LUKS was erring a bit on the side of fast booting at the expense of security.
Since my machine typically takes more than 30 seconds to boot, I thought I could probably afford 8 seconds of key hashing; let's see what that works out to:
$ sudo /sbin/cryptsetup luksAddKey -i 8000 /dev/sda6
Enter any passphrase:
Enter new passphrase for key slot:
Verify passphrase:
$ sudo /sbin/cryptsetup luksDump /dev/sda6
...
Key Slot 1: ENABLED
Iterations: 282179
Yes, it looks like it was being too fast. So that gives us about 2¹⁸ iterations.
After some months of experience with this, I conclude that -i 8000
is too
long, because it slows down the process of learning the passphrase. -i 1500
,
taking 1½ seconds instead of 8 to decrypt your disk,
loses almost 3 bits of key-stretching, but it's still more than adequate,
is much more pleasant to use, and makes memorizing the passphrase much faster.
Now we have the question of how much we want cracking the disk to cost. Let’s figure that the aforementioned 8-GPU machine costs about US$2000, or US$1000 per year, and we'd like cracking the disk to require more than the current world GDP for a century. Current world GDP is about US$70 trillion/year, so a century would be US$7 quadrillion.
So that's 7 trillion machine-years, or about 2⁴³ machine-years. A year is 31 556 926 seconds, or about 2²⁵ seconds, so we want cracking to need 2⁶⁸ machine seconds, and at 2³² iterations per second, that's 2¹⁰⁰ iterations of PBKDF2. If you need 2¹⁸ iterations per password, that's 2⁸⁸ passwords, so you need 89 bits of entropy to give you that expected brute-forcing time. 16 of those can come from the key-stretching mentioned above, leaving 73 bits of entropy to come from the passphrase.
What does 89 bits of entropy look like? It looks like this:
$ ./bitwords.py 89
The 89-bit number 394960078400707194168835240 can be represented as:
In binary: 10100011010110100000011001100000010011010010011111000101110010001110110000111110010101000
In hex: 146b 40cc 09a4 f8b9 1d87 ca8
In the first half of the alphabet to avoid cusswords, like Chrome: jfmhmdfdkgehjefhecdfgmee
In lowercase letters and digits: 1dmjvjdsb475cm8aeg
In lowercase letters: nkixlgnajwgpbrekqce
In letters and digits: vQw9Dl8Xru0wonS
In letters, digits, hyphen, and underscore, like YouTube: kqQ3c2qjUKhS7OE
In printable ASCII: )ntDL4#jLsB9dW
In 12-bit words: are rooms important need politics servants judgement personally
In 12-bit 6-letter words: are boss better house pool venue asylum nicely
In 12-bit 5-letter words: are gifts young end draft doug ache tram
In 6-bit words: are they more to on and they have could one by them is who their
In minimal Japanese-like syllables: fisida dehubi nedoda fahegi barubu na
In complicated made-up syllables: bamk fims prul kawst taimt nawtth scral
In the S/Key word list: ABE YAW JAKE OS POE WAYS ONTO CADY HAWK
In ideographs: 丄訛孶诋罙澀齜
That is, it's about 8 or 9 words, or 19 lowercase letters. I want to restrict my passphrase to lowercase letters and possibly spaces, because often I boot my netbook without being able to see the keyboard very well, and punctuation and digits are hard to type accurately.
I did some experiments with 89-bits-of-entropy passphrases. The results apply equally well to the 73-bits-of-entropy passphrases I ended up recommending. Here are the notes from the experiments.
I typed the form "nkixlgnajwgpbrekqce" 8 times in 42 seconds with only one error, or about 5 seconds each; this seems reasonable, although it's harder to memorize.
The form "abe yaw jake os poe ways onto cady hawk" I typed 9 times in 62 seconds, or about 7 seconds each, again with only one error, plus another case where I corrected a word with ^W. I'm not sure I have ^W to erase a word during boot time.
The form "are gifts young end draft doug ache tram" similarly had one error out of 8 times, which took 58 seconds.
Given similar error rates for all three, there's a tradeoff between typing speed and memorizability. The S/Key word list seems to generate less-memorable passphrases:
In the S/Key word list: BALD WHEE TOWN HAIL BAIT HAAS RAVE NEAL
In the S/Key word list: LOP GOOF NESS STEM MORT STIR WICK AVER
In the S/Key word list: ABE CURE ELM CLOT ROSE AHOY RUM EVEN SOW
In the S/Key word list: HATH SAW PAY CRIB LATE OS BARK HUM
In the S/Key word list: ABE MOSS BYE BOYD RIDE AVER USER GUSH DIG
In the S/Key word list: BUNK DAB BITE RUM GANG WENT LIFE SUM
In the S/Key word list: ABE DIRT HOE SNOB FAIL SOCK RUTH SAND CLAN
In the S/Key word list: CUR WHEE NAGY JUJU HARM ACTS QUO ARGO
In the S/Key word list: ARID LIST HOST WHAM LAB HARD HOSE MUM
In the S/Key word list: HOVE SELF MAT ELAN TUSK BAT DEAR COMB
While the 4096 most common words of 5 letters or less are somewhat better, and only longer by about 3 letters on average:
In 12-bit 5-letter words: of fee fees star mice half geoff hosts
In 12-bit 5-letter words: they firm id lump cup chief days gap
In 12-bit 5-letter words: at ledge colin gaps ai ibid fungi seas
In 12-bit 5-letter words: in bum klaus swim towns rump fame mar
In 12-bit 5-letter words: they june giant andy mate scot creek third
In 12-bit 5-letter words: as fig thou tha next fancy jim sid
In 12-bit 5-letter words: they dwarf fife stair josh turks spade dame
In 12-bit 5-letter words: is jan obey lick swell danny onion exact
In 12-bit 5-letter words: that drill some tina mungo hedge noise leak
In 12-bit 5-letter words: you bean jon quiet bingo ricky allow sings
Taking extreme examples, "that drill some Tina Mungo hedge noise leak" is something I can easily visualize a cartoon of, while "Abe moss bye Boyd ride aver user gush dig" is quite a bit more difficult. So it's a tradeoff between 60% more typing time than "nkixlgnajwgpbrekqce" (a 3-second slower bootup) and more easily memorizing "are gifts young end draft doug ache tram" (Are gifts young? End draft! Doug ache — tram.)
In my book, the ease of memorization is worth the extra three seconds.
Eight 12-bit words can actually encode up to 96 bits of entropy. If you cut the password entropy down from 89 bits to 84, you can get by with 7 12-bit words; or at 88 bits, you could use 8 11-bit words:
The 88-bit number 155090174075185364222823520 can be represented as:
...
In 12-bit 5-letter words: was may deed cite sits sec after burke
In 11-bit 5-letter words: lewis born pond bruce leg keeps open wet
...
The 83-bit number 9078977035531963957140619 can be represented as:
...
In 12-bit 5-letter words: grin altar cook alley shaw sixty strap
If I order these three by memorability, I think the order is:
So, at the cost of cutting the attack cost by half, we can apparently improve memorability substantially; but reducing it by a factor of 2⁵, doesn't help particularly. If we reduce to 77 bits, a security loss of 2¹² (which might still be an acceptable level of security) perhaps we can do better still on memorability:
The 75-bit number 26185477596619230170512 can be represented as:
In 11-bit 5-letter words: live hook is help doors anna adult
The 77-bit number 114258389520881705891469 can be represented as:
In 11-bit 5-letter words: fancy cat grant backs robin cable iraqi
The 77-bit number 121847319416516085707325 can be represented as:
In 11-bit 5-letter words: acres x roads hunt over valid peace
So that is how I generated my new passphrase.
$ ./bitwords.py 77
(output omitted)
$ sudo /sbin/cryptsetup luksAddKey -i 8000 /dev/sda6
Enter any passphrase:
Enter new passphrase for key slot:
Verify passphrase:
Later, I discovered that you can get 12 bits per word while limiting them to five letters with no real loss of memorability.
luksKillSlot
Now to remove the key I added earlier when testing the number of iterations, so it doesn't slow down booting from now on:
$ sudo /sbin/cryptsetup luksKillSlot /dev/sda6 1
Enter any remaining LUKS passphrase:
$ sudo /sbin/cryptsetup luksDump /dev/sda6
...
Key Slot 0: ENABLED
Iterations: 34801
Salt: (omitted)
Key material offset: 8
AF stripes: 4000
Key Slot 1: DISABLED
Key Slot 2: ENABLED
Iterations: 285929
Salt: (omitted)
Key material offset: 1016
AF stripes: 4000
...
So now I have my original passphrase and the new one. I hibernated
the machine and unhibernated using the new passphrase, both as a test
and to practice the new passphrase. I'll keep practicing it
throughout the day so as not to forget it; once I've been using it for
a couple of days, I can use luksKillSlot
to remove the original weak
passphrase.
The practice schedule should look something like this:
The intervals here are a bit more conservative than the usual 1, 5, 25, 125, 625, 3125 intervals for practice memorization, for two reasons:
The passphrase, although I impose linguistic associations onto it, doesn't really have an underlying structure that simplifies it; so it's inherently a hard thing to memorize.
If at some point I forget the passphrase, I don't have the opportunity to recover it from some reminder medium, as is usually the case for things I'm trying to memorize. I can't just turn over a flash card. If I forget it once, and it's not still in my terminal scrollback, it's gone.
Oh, and on the third hibernate, I verified that ^W does in fact work to correct a mistyped word. This means that a multiword passphrase is much less error-prone: if you think you may have mistyped a word, you can type ^W and continue from the beginning of the word.
(And on the fourth, I misremembered the passphrase at first, suggesting that the schedule may not be conservative enough. I hope that the shorter six-word passphrase I'm recommending now will be easy enough to memorize that this practice schedule is adequate.)
In the above case, the Debian installer had done the initial cryptsetup and LVM setup for me. Later I set up LUKS on a new USB disk, without LVM, like this:
$ sudo cfdisk /dev/sdc # then interactively create the partition table
$ sudo cryptsetup -i 1000 -v -y luksFormat /dev/sdc1 # prompts passphrase
Here -v
is verbose, -y
is to prompt for passphrase verification,
and -i 1000
is to use 1 second of key hashing. After some months of
experience with the 8-second-hashing setup mentioned earlier, I think
it's probably a better user experience to use a shorter hashing time,
like 500 or 1000 milliseconds, and if necessary a longer passphrase —
although I'm still using just an 88-bit passphrase. 8 seconds is fine
when your disk mounts correctly, but it's a pain when you mistype the
passphrase and don't find out for 8 seconds.
Then:
$ sudo cryptsetup luksOpen /dev/sdc1 usbdisk # prompts for passphrase
$ sudo mkfs -t ext4 /dev/mapper/usbdisk
$ sudo tune2fs -r 0 /dev/mapper/usbdisk # eliminate root reserved space
$ sudo mkdir /usbdisk
$ sudo mount /dev/mapper/usbdisk /usbdisk
There is mysteriously 200M occupied in /usbdisk
according to df, but
I think that's the filesystem's journal.
cryptsetup status
shows the cipher as aes-cbc-essiv:sha256
rather
than the aes-xts-plain64
I got before, presumably because the
cryptsetup defaults changed. I'm assuming, without investigating,
that this is an okay default.
Unmounting the disk manually is done like this:
$ sudo umount /usbdisk
$ sudo cryptsetup luksClose usbdisk
It's easier to interactively mount and unmount the disk via an interactive file manager such as XFCE's Thunar, which mounts it in /media/really-long-uuid after devmapping it in /dev/mapper/udisks-luks-uuid-anotherreallylonguuid. I assume the file managers in GNOME and KDE also work.
In 1999 I suggested using error-correction coding, ECC, in passphrases. This would still be a good idea, but 14 more years of Moore's Law have made it substantially more important. A 77-bit passphrase today provides a level of security similar to what a 67-bit passphrase provided at the time, but you need to type seven words without an error instead of six.
As an alternative that allowed people to use passwords they chose themselves, you could store some kind of checksum of the password instead, alongside the salt. A properly chosen four-bit checksum would catch at least 15 out of every 16 mistyped passwords immediately, without having to wait eight seconds, at the expense of effectively weakening your password entropy by (almost) four bits. ECC would save you from having to retype your entire passphrase, but the checksum would save you from having to wait a number of seconds to find out whether you got it right or not. (The rapid feedback would speed the process of passphrase memorization.)
Actually using the machine's GPU for the passphrase hashing would speed up the hashing process by an order of magnitude, eliminating the hardware advantage for attackers. This, however, requires having GPU drivers loaded at early boot time.
I posted an earlier version of this program at http://lists.canonical.org/pipermail/kragen-hacks/2012-April/000538.html. The current version is at http://canonical.org/~kragen/sw/netbook-misc-devel/bitwords.py. This version has the S/Key word list removed for brevity.
#!/usr/bin/python
# -*- coding: utf-8 -*-
"Represent numbers as strings of symbols."
import codecs, itertools, math, random, string, sys
def main(argv):
if len(argv) == 1:
bits = 80
number = random.SystemRandom().getrandbits(bits)
elif len(argv) == 2:
number = random.SystemRandom().getrandbits(int(argv[1]))
else:
number = int(argv[2])
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
print "The %d-bit number" % math.ceil(math.log(number)/math.log(2)), number,
print "can be represented as:"
print "In binary:",''.join(represent_with(number, '01'))
print "In hex:",' '.join(''.join(group) for group in groups(represent_with(number, digits + 'abcdef'), 4))
print "In the first half of the alphabet to avoid cusswords, like Chrome:",
print ''.join(represent_with(number, 'abcdefghijklm'))
print "In lowercase letters and digits:",
print ''.join(represent_with(number, digits + lowercase))
print "In lowercase letters:",
print ''.join(represent_with(number, lowercase))
print "In letters and digits:",
print ''.join(represent_with(number, digits + lowercase + uppercase))
print "In letters, digits, hyphen, and underscore, like YouTube:",
print ''.join(represent_with(number, digits + lowercase + uppercase + '-_'))
print "In printable ASCII:",
printable_ascii = [chr(ii) for ii in range(33, 127)]
print ''.join(represent_with(number, printable_ascii))
print "In 12-bit words:", ' '.join(represent_with(number, wordlist(4096)))
print "In 12-bit 6-letter words:", ' '.join(represent_with(number,
mlwords(4096,6)))
print "In 12-bit 5-letter words:", ' '.join(represent_with(number,
mlwords(4096,5)))
print "In 11-bit 5-letter words:", ' '.join(represent_with(number,
mlwords(2048,5)))
print "In 6-bit words:", ' '.join(represent_with(number, wordlist(64)))
print "In minimal Japanese-like syllables:",
print ' '.join(''.join(group)
for group in groups(represent_with(number, syllables), 3))
print "In complicated made-up syllables:",
print ' '.join(represent_with(number, english))
print "In ideographs:", ''.join(represent_with(number, ideographs))
digits = string.digits
lowercase = string.ascii_lowercase
uppercase = string.ascii_uppercase
def represent_with(number, digits):
return [digits[d] for d in represent(number, len(digits))]
def represent(number, base):
quotient, remainder = divmod(number, base)
return ([] if quotient == 0 else represent(quotient, base)) + [remainder]
def wordlist(maxwords):
return list(itertools.islice(words(), maxwords))
def mlwords(maxwords, maxlen):
return list(itertools.islice((word for word in words()
if len(word) <= maxlen), maxwords))
def words():
return (line.split()[1] for line in
open('/home/user/devel/wordlist')) # http://canonical.org/~kragen/sw/wordlist
def groups(it, n):
it = iter(it)
while True:
thing = list(itertools.islice(it, n))
if not thing:
break
yield thing
# Here I’m trying to avoid letters whose sounds are easily confused:
# - no sounds differing only in voicedness and/or aspiration (b
# vs. p, t vs. d, k vs. g, s vs. z, f vs. v)
# - no sounds differing only in fricative vs. stop (b vs. v)
syllables = [C + V
for C in 'bdfghjrmns'
for V in 'aeiou']
# Here I exploit as much of English syllable structure as I can manage.
english = [onset + v + coda
for onset in (list('bdfgjklmnprstvwz') +
[P + L for P in 'bcfgp' for L in 'lrw'] +
['s' + P + L for P in ['c', 'p'] for L in 'lrw'] +
['tr', 'tw', 'str']
)
for v in ['a', 'ah', 'ai', 'aw', 'e', 'ee', 'i', 'oa', 'oo', 'u']
for coda in ([prefinal + final
for prefinal in ['m', 'n', 'ng', 'l', 's']
for final in 'bdkt'] +
[final + postfinal
for final in ['ng'] + list('bdfgklmnprt')
for postfinal in ['s', 'th', '']]
+ ['ss', 'z', 'j', ''])
]
assert len(english) == len(set(english))
# This needs some work, because it includes a fair number of code
# points with no characters assigned yet, or no glyph in the fonts I
# have installed.
# <http://en.wikipedia.org/wiki/List_of_CJK_Unified_Ideographs,_part_1_of_4>
# has empty squares scattered around apparently at random. :(
ideographs = [unichr(ii) for ii in range(0x4E00, 0x9FA6)]
def ok(a, b): assert a == b, (a, b)
ok(represent(12345, 10), [1, 2, 3, 4, 5])
ok(represent(0, 10), [0])
ok(wordlist(5), ['the', 'of', 'and', 'to', 'a'])
if __name__ == '__main__':
main(sys.argv)