Improving my keyboard input on Debian GNU/Linux

I made all my software use UTF-8, remapped my keyboard to be more convenient, added some Compose key mappings to support nicer typography, got all my software to use them, and added some extra logic to Emacs.

Setting up UTF-8

I set my Debian system's language to en_US.UTF-8 so that everything uses UTF-8, as it obviously should. I do this by putting the following into /etc/environment:

LANG="en_US.UTF-8"

And then restarting everything.

UTF-8 allows me to have characters from many different languages all working together, everywhere (except when some dumb program thinks it should use some incompatible character encoding.) Not everything looks at this environment variable to decide whether to use UTF-8, unfortunately, and perl sometimes seems to have a problem, like when I run David A. Wheeler’s “SLOCCount”:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "en_US"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

I don't know why perl has this problem; there is a /usr/share/locale/en_US and a /usr/share/i18n/locales/en_US. In strace -ff -o tmp/sloccount.strace sloccount pkgs/PyMeta, it looks like it's looking for LC_IDENTIFICATION:

open("/usr/lib/locale/en_US/LC_IDENTIFICATION", O_RDONLY) = -1 ENOENT
open("/usr/lib/locale/en/LC_IDENTIFICATION", O_RDONLY) = -1 ENOENT

I don't know why LC_IDENTIFICATION should exist or what it does. Maybe I should look that up.

Emacs, contrary to what I originally thought, does use LANG to decide whether or not its files should be encoded in UTF-8 by default, by setting the default value of the variable buffer-file-coding-system. I think I used to use (prefer-coding-system 'utf-8) before I set my LANG to en_US.UTF-8.

I could also talk about configuring irssi, screen, and Apache to use UTF-8, and marking text files as being encoded in UTF-8, but I won't.

Remapping my keyboard

I remapped a few keys. In particular, I set my right Alt key to be a Compose key; by default, X has a bunch of multi-key sequences that map to funny characters that start with the Compose key. They are stored in /usr/share/X11/locale/en_US.UTF-8/Compose for me, since my locale is en_US.UTF-8. This means that Emacs 22, xterm, and KDE applications — pretty much everything except GTK+ apps and Firefox — have support for multilingual text input.

I first edited that file in Emacs by telling Emacs to edit /sudo::/usr/share/X11/locale/en_US.UTF-8/Compose, which uses TRAMP and sudo to edit the file. However, now I use the kragen/xcompose github project to set up the compose key mapping.

Apparently if you ① aren't using XKB and ② put this into a file in your home directory called .Xmodmap, it will get read by some part of the default X session setup stuff:

! From http://lists.canonical.org/pipermail/kragen-hacks/2002-January/000306.html
! (plus some additions)

! Olin Shivers writes:
! Life with lower-case parens is verra nice, as former LispM hackers will tell
! you. One day I was missing that feature, so I sat down and fixed it. The
! following xmodmap file 
! Swap () and []
! [version using keycodes elided -- Kragen] 
! swaps () & [] on the keyboard on my linux laptop under X, so that parens are
! conveniently accessible -- lowercase & not on the hard-to-touch-type topmost
! row. I highly recommend this hack to Lisp & Scheme hackers.
! ...
! ...These keyboard hacks also have the nice side effect that when other people use
! ...my laptop for a little while, they separate out into two sets:
!   - the set of people who go, "Hey? What's going on? Your bracket keys
!     generate parens?!?!"
!   - and the, usually older and more select, group that go, "Hey! You've
!     got lower-case parens! That's great!" and then start laughing.
! 
! [slightly more generic version that follows; it works on keyboards with different
! keycodes, but will be a little weird if your [] keys don't have {} on them, or
! your 9 0 keys don't have () on them -- kragen]
keysym bracketleft = parenleft braceleft
keysym bracketright = parenright braceright
keysym parenleft = 9 bracketleft
keysym parenright = 0 bracketright
! That isn't idempotent like the keycode version, and it isn't self-reversing.
! Undoing it on a standard PC keyboard requires the following:
! keysym bracketleft = 9 parenleft
! keysym bracketright = 0 parenright
! keysym parenleft = bracketleft braceleft
! keysym parenright = bracketright braceright

! map caps lock to control for easier typing, as in the xmodmap man page
remove Lock = Caps_Lock
remove Control = Control_L
! keysym Control_L = Caps_Lock (why would you want to have a caps-lock key?!?)
keysym Caps_Lock = Control_L
! add Lock = Caps_Lock
add Control = Control_L

! set right Alt key to be Compose
remove Mod1 = Alt_R
keysym Alt_R = Multi_key

That stopped working reliably in recent versions of X, sometime around 2010, because newly-plugged-in or just-rebooted keyboards wouldn't get the correct mapping, so I wrote xmodmapd, a short shell script which runs until the session exits, reapplying those keymappings whenever they come undone. As explained below, I'm running it from ~/.xsessionrc.

Setting up more Compose sequences

This section is obsolete; it's been replaced by the the kragen/xcompose project on GitHub.

I was chatting with “michi” on a Debian channel, and he told me about the Compose file I mentioned above. He also mentioned that he had some custom sequences added to that file. Here they are, with some additions of my own:

# michi's:
# Custom additions: Typography
<Multi_key> <period> <period> <period>  : "…" U2026       # HORIZONTAL ELLIPSIS
# These two are already present for me:
# <Multi_key> <minus> <minus> <minus>   : "—" U2014       # EM DASH
# <Multi_key> <minus> <minus> <period>  : "–" U2013       # EN DASH
<Multi_key> <minus> <minus> <space> : "– "            # EN DASH (followed by space)
<Multi_key> <backslash> <minus>     : "­"  U00AD       # SOFT HYPHEN
<Multi_key> <comma> <space>     : "‚" U201A       # SINGLE LOW-9 QUOTATION MARK
<Multi_key> <comma> <comma>     : "„" U201E       # DOUBLE LOW-9 QUOTATION MARK
<Multi_key> <apostrophe> <space>    : "’" U2019       # RIGHT SINGLE QUOTATION MARK
<Multi_key> <apostrophe> <apostrophe>   : "”" U201D       # RIGHT DOUBLE QUOTATION MARK
<Multi_key> <grave> <space>     : "‘" U2018       # LEFT SINGLE QUOTATION MARK
<Multi_key> <grave> <grave>     : "“" U201C       # LEFT DOUBLE QUOTATION MARK
<Multi_key> <less> <bar>        : "↵" U21B5       # DOWNWARDS ARROW WITH CORNER LEFTWARDS
<Multi_key> <o> <period>        : "•" U2022       # BULLET
# By default <Multi_key> <period> <period> does this, but we broke that with the ... binding.
<Multi_key> <o> <comma>         : "·"  periodcentered  # MIDDLE DOT
<Multi_key> <space> <space>     : " "  U00A0       # NO-BREAK SPACE
<Multi_key> <backslash> <comma>     : " " U2009       # THIN SPACE
<Multi_key> <minus> <less>      : "←" leftarrow   # LEFTWARDS ARROW
<Multi_key> <minus> <asciicircum>   : "↑" uparrow     # UPWARDS ARROW
<Multi_key> <minus> <greater>       : "→" rightarrow  # RIGHTWARDS ARROW
<Multi_key> <minus> <v>         : "↓" downarrow   # DOWNWARDS ARROW
<Multi_key> <less> <minus> <greater>    : "↔" U2194           # LEFT RIGHT ARROW (kragen's)

# Custom additions: Mathematical symbols
<Multi_key> <exclam> <equal>        : "≠" U2260       # NOT EQUAL TO
<Multi_key> <less> <equal>      : "≤" U2264       # LESS-THAN OR EQUAL TO
<Multi_key> <greater> <equal>       : "≥" U2265       # GREATER-THAN OR EQUAL TO
<Multi_key> <i> <n>         : "∈" U220A       # ELEMENT OF
<Multi_key> <exclam> <i> <n>        : "∉" U2209       # NOT AN ELEMENT OF
<Multi_key> <a> <p>         : "≅" U2245       # APPROXIMATELY EQUAL TO
<Multi_key> <colon> <equal>     : "≔" U2254       # COLON EQUALS
<Multi_key> <s> <q>         : "√" U221A       # SQUARE ROOT
<Multi_key> <slash> <backslash>         : "∧"  U2227           # LOGICAL AND
<Multi_key> <backslash> <slash>         : "∨"  U2228           # LOGICAL OR
<Multi_key> <o> <asterisk>              : "∘"   U2218           # RING OPERATOR (function composition)
<Multi_key> <E> <E>                     : "∃"  U2203           # THERE EXISTS
<Multi_key> <exclam> <E> <E>            : "∄"   U2204           # THERE DOES NOT EXIST
<Multi_key> <A> <A>                     : "∀"  U2200           # FOR ALL
<Multi_key> <Q> <E> <D>                 : "∎"   U220E           # END OF PROOF

# Custom additions: Greek letters.  Mapping corresponds to Emacs Greek input method.
<Multi_key> <asterisk> <a>      : "α"  U03B1       # GREEK SMALL LETTER ALPHA
<Multi_key> <asterisk> <b>      : "β"  U03B2       # GREEK SMALL LETTER BETA
<Multi_key> <asterisk> <c>      : "ψ"  U03C8       # GREEK SMALL LETTER PSI
<Multi_key> <asterisk> <d>      : "δ"  U03B4       # GREEK SMALL LETTER DELTA
<Multi_key> <asterisk> <e>      : "ε"  U03B5       # GREEK SMALL LETTER EPSILON
<Multi_key> <asterisk> <f>      : "φ"  U03C6       # GREEK SMALL LETTER PHI
<Multi_key> <asterisk> <g>      : "γ"  U03B3       # GREEK SMALL LETTER GAMMA
<Multi_key> <asterisk> <h>      : "η"  U03B7       # GREEK SMALL LETTER ΕΤΑ
<Multi_key> <asterisk> <i>      : "ι"  U03B9       # GREEK SMALL LETTER ΙΟΤΑ
<Multi_key> <asterisk> <j>      : "ξ"  U03BE       # GREEK SMALL LETTER XI
<Multi_key> <asterisk> <k>      : "κ"  U03BA       # GREEK SMALL LETTER KAPPA
<Multi_key> <asterisk> <l>      : "λ"  U03BB       # GREEK SMALL LETTER LAMBDA
<Multi_key> <asterisk> <m>      : "μ"  U03BC       # GREEK SMALL LETTER MU
<Multi_key> <asterisk> <n>      : "ν"  U03BD       # GREEK SMALL LETTER NU
<Multi_key> <asterisk> <o>      : "ο"  U03BF       # GREEK SMALL LETTER OMICRON
<Multi_key> <asterisk> <p>      : "π"  U03C0       # GREEK SMALL LETTER PI
# no mapping for q; in Emacs that's ";"
<Multi_key> <asterisk> <r>      : "ρ"  U03C1       # GREEK SMALL LETTER RHO
<Multi_key> <asterisk> <s>      : "σ"  U03C3       # GREEK SMALL LETTER SIGMA
<Multi_key> <asterisk> <t>      : "τ"  U03C4       # GREEK SMALL LETTER TAU
<Multi_key> <asterisk> <u>      : "θ"  U03B8       # GREEK SMALL LETTER THETA
<Multi_key> <asterisk> <v>      : "ω"  U03C9       # GREEK SMALL LETTER OMEGA
<Multi_key> <asterisk> <w>      : "ς"  U03C2       # GREEK SMALL LETTER FINAL SIGMA
<Multi_key> <asterisk> <x>      : "χ"  U03C7       # GREEK SMALL LETTER CHI
<Multi_key> <asterisk> <y>      : "υ"  U03C5       # GREEK SMALL LETTER UPSILON
<Multi_key> <asterisk> <z>      : "ζ"  U03B6       # GREEK SMALL LETTER ZETA

# Capital greek letters.
<Multi_key> <asterisk> <A>      : "Α"  U0391       # GREEK CAPITAL LETTER ALPHA
<Multi_key> <asterisk> <B>      : "Β"  U0392       # GREEK CAPITAL LETTER BETA
<Multi_key> <asterisk> <C>      : "Ψ"  U03A8       # GREEK CAPITAL LETTER PSI
<Multi_key> <asterisk> <D>      : "Δ"  U0394       # GREEK CAPITAL LETTER DELTA
<Multi_key> <asterisk> <E>      : "Ε"  U0395       # GREEK CAPITAL LETTER EPSILON
<Multi_key> <asterisk> <F>      : "Φ"  U03A6       # GREEK CAPITAL LETTER PHI
<Multi_key> <asterisk> <G>      : "Γ"  U0393       # GREEK CAPITAL LETTER GAMMA
<Multi_key> <asterisk> <H>      : "Η"  U0397       # GREEK CAPITAL LETTER ΕΤΑ
<Multi_key> <asterisk> <I>      : "Ι"  U0399       # GREEK CAPITAL LETTER ΙΟΤΑ
<Multi_key> <asterisk> <J>      : "Ξ"  U039E       # GREEK CAPITAL LETTER XI
<Multi_key> <asterisk> <K>      : "Κ"  U039A       # GREEK CAPITAL LETTER KAPPA
<Multi_key> <asterisk> <L>      : "Λ"  U039B       # GREEK CAPITAL LETTER LAMBDA
<Multi_key> <asterisk> <M>      : "Μ"  U039C       # GREEK CAPITAL LETTER MU
<Multi_key> <asterisk> <N>      : "Ν"  U039D       # GREEK CAPITAL LETTER NU
<Multi_key> <asterisk> <O>      : "Ο"  U039F       # GREEK CAPITAL LETTER OMICRON
<Multi_key> <asterisk> <P>      : "Π"  U03A0       # GREEK CAPITAL LETTER PI
# no mapping for Q; in Emacs that's ":"
<Multi_key> <asterisk> <R>      : "Ρ"  U03A1       # GREEK CAPITAL LETTER RHO
<Multi_key> <asterisk> <S>      : "Σ"  U03A3       # GREEK CAPITAL LETTER SIGMA
<Multi_key> <asterisk> <T>      : "Τ"  U03A4       # GREEK CAPITAL LETTER TAU
<Multi_key> <asterisk> <U>      : "Θ"  U0398       # GREEK CAPITAL LETTER THETA
<Multi_key> <asterisk> <V>      : "Ω"  U03A9       # GREEK CAPITAL LETTER OMEGA
# Emacs maps W to "Σ", but I think that's stupid
<Multi_key> <asterisk> <X>      : "Χ"  U03A7       # GREEK CAPITAL LETTER CHI
<Multi_key> <asterisk> <Y>      : "Υ"  U03A5       # GREEK CAPITAL LETTER UPSILON
<Multi_key> <asterisk> <Z>      : "Ζ"  U0396       # GREEK CAPITAL LETTER ZETA

# If you wanted to actually type in Greek, you would also need άίέ
# etc.  But you would probably just switch to a Greek keyboard layout.

# Custom additions: for chat (kragen)
<Multi_key> <colon> <parenright>        : "☺"   U263A           # WHITE SMILING FACE
<Multi_key> <colon> <parenleft>         : "☹"   U2639           # WHITE FROWNING FACE
<Multi_key> <exclam> <question>         : "‽"   U203D           # INTERROBANG
<Multi_key> <less> <3>                  : "♥"  U2665            # BLACK HEART SUIT
<Multi_key> <o> <slash> <asciitilde>    : "♫"   U266B           # BEAMED EIGHTH NOTES
<Multi_key> <p> <c>                     : "☮"   U262E           # PEACE SYMBOL
<Multi_key> <asterisk> <parenleft>      : "﴾"   UFD3E           # ORNATE LEFT PARENTHESIS
<Multi_key> <asterisk> <parenright>     : "﴿"   UFD3F           # ORNATE RIGHT PARENTHESIS
<Multi_key> <k> <s>                     : "ʘ"   U0298           # LATIN LETTER BILABIAL CLICK (kiss sound)
<Multi_key> <bar> <greater>             : "‣"   U2023           # TRIANGULAR BULLET
<Multi_key> <asciicircum> <minus>       : "⁻"   U207B           # SUPERSCRIPT MINUS
<Multi_key> <asciicircum> <n>           : "ⁿ"  U207F           # SUPERSCRIPT LATIN LETTER SMALL N
<Multi_key> <asciicircum> <i>           : "ⁱ"  U2071           # SUPERSCRIPT LATIN LETTER SMALL I
<Multi_key> <asciitilde> <equal>        : "≈"  U2248           # ALMOST EQUAL TO
<Multi_key> <s> <t>                     : "st"  UFB06           # LATIN SMALL LIGATURE ST

Fixing GTK+ (and Firefox)

GTK+ text widgets, by default, use their own GTK+ input methods, which have their own handling of the Compose key; generally they have a dropdown menu to configure GTK+ to use a different input method, and if you select "X Input Method" from the menu, the Compose key will work.
Firefox uses GTK+ text widgets by default, but it doesn't have that menu, as far as I can tell.

There's an environment variable called GTK_IM_MODULE which configures the default GTK+ input method, and if you set it to xim, things will work by default. I'm not sure where you set environment variables in general on a per-user basis so that your whole X session will have them any more. It used to be that you would put that stuff in your .Xsession. But GDM doesn't run your .Xsession.

However, GTK_IM_MODULE used to be set in /etc/X11/Xsession.d/80im-switch, and the logic there was happy to load a value from $HOME/.xinput.d/en_US or $HOME/.xinput.d/all_ALL, although right now it's being set in /etc/X11/xinit/xinput.d/default, which tells me to “see im-switch(8) and /usr/share/doc/im-switch/README.Debian”, neither of which exists. So I tried this:

kragen@thrifty:~$ mkdir .xinput.d
kragen@thrifty:~$ echo GTK_IM_MODULE=xim >> .xinput.d/all_ALL

And that worked, after restarting X.

However, that, too, stopped working in recent versions of X, because /etc/X11/Xsession.d/80im-switch no longer exists, so instead, as recommended by unix.stackexchange, I put the following into ~/.xsessionrc, which is read by /etc/X11/Xsession.d/40x11-common_xsessionrc:

export GTK_IM_MODULE=xim
$HOME/devel/inexorable-misc/xmodmapd &
   

This is also the place to set things like PATH.

Smart Quotes in Emacs

You will notice that the keybindings for nice quote marks are noticeably worse than the keybindings for the ugly ASCII ones. So I extended Emacs with the few lines of elisp below, which let me type Alt-" to automatically reformat recent ASCII quote marks into nice quote marks. So I can type stuff with normal quote marks, then fix it with a few keystrokes when I notice; but because it's an interactive process, there's little danger of accidentally munging quotes that need to be ASCII quotes, and little danger of getting the wrong kind of quote. Both of these happen frequently with the standard approaches to “smart quotes”.

(defun smartquote ()
  "Turn the previous '\"' character into either '“' or '”', based on context.

  This displays an arrow pointing at the changed character.
  "
  (interactive)
  (save-excursion
    (search-backward "\"")
    (cond ((bobp) (replace-match "“"))
          ((looking-at ".\\s-")         ; before whitespace, close
           (delete-char 1) (insert "”")) 
          ((looking-back "\\s-")        ; after whitespace, open
           (delete-char 1) (insert "“"))
          ((looking-at ".\\sw")         ; before a word, open (e.g. '("hi")')
           (delete-char 1) (insert "“")) 
          (t (delete-char 1) (insert "”"))) ; default: close
    (momentary-string-display "←----" (point))))
(global-set-key [(meta ?\")] 'smartquote)