System Effects of Not Using Lisp

In this page, when I refer to ``Lisp'', I don't just mean Lisp --- I mean languages with strong typing, pointer safety, and garbage collection, such as Lisp, Scheme, Tcl, REBOL, etc. When I say ``C'', I mean languages without, such as C, almost any assembly language, or Forth.

When an operating system is designed to run C programs, there are several deleterious effects evident --- effects that could be avoided if all the programs that ran on it were written in Lisp. I will refer to operating systems designed to run C programs as ``C OSes''.

Memory protection

The first effect is memory protection. A minor error in a C program can result in the program trying to write to arbitrary memory locations; a simple C program can read all of memory, violating any security guarantees the system may otherwise provide.

In order to cope with these problems, the software running on a system is divided into ``processes'', each of which has its own memory space. No process can access memory that belongs to another process, and no process can access memory at arbitrary physical locations (for example, the video card). This way, it is possible to run many C programs on the same computer, with some assurance that no bug can cause the whole system to crash, and a bug in one program cannot cause the other programs to crash. Only a bug in the operating system can cause these things to happen.

Most of the problems below come from the necessity of these ``processes''.

With Lisp, of course, no piece of code can access memory it doesn't have a pointer to. So no code can accidentally stomp on work belonging to unrelated code. And if you don't have a pointer to a piece of data, you can't ever read it, and the only way to get a pointer to it is for someone who does have a pointer to it to give it to you. So you don't need ``processes'' --- just multiple threads of execution.

Of course, you don't need them with C, either. It's just that as you scale up to bigger and bigger systems, you end up having a harder and harder time debugging pointer bugs, since their manifestation is often far from their origin.

Context switch times

Switching between processes requires reloading the MMU's page tables, which means that it takes a while. Switching between threads of execution is much quicker. This provides a perverse incentive to C programmers to bundle as much functionality as possible into a single ``process'', even if it requires multiple threads of execution. But this leads back to the scaling problem that memory protection was intended to defend against.

Hardware support

Memory protection for C programs requires an MMU; this limits the hardware support for C OSes. Most DSPs don't have MMUs; many popular embedded CPUs don't have MMUs. Of course, you can run programs written in C on them, but as you scale up, you run into the same problem --- debugging gets harder and harder, and you lose more with every crash.

On the other hand, every incremental garbage-collection technique also requires an MMU, so this is a bit of a lousy argument.

Interprocess communication

In order to allow code running in separate ``processes'' to communicate, C OSes have all sorts of elaborate mechanisms to safely circumvent memory protection. Some of these mechanisms are reasonably efficient and flexible; most are crude, slow, inefficient, and clumsy.

Memory Leaks

Sometimes C programs forget to notify the OS that they're no longer using some particular piece of memory. Being programs, they do this in a systematic way, and so the amount of memory ``leaked'' in this way eventually grows without bound.

In order to prevent this situation from requiring daily reboots, C OSes associate each bit of allocated memory with a process. When the process ends, the allocated memory is freed, even if the process had ``leaked'' it.

Of course, this makes it more difficult to share resources that live in memory. If you want to communicate something to another process, and you might die before they do, you have to make sure that what you communicate doesn't depend on stuff that'll be deallocated when you die. So you copy the whole thing into someplace where the other process can get access to it.

Of course, it's tough to send something that depends on stuff in your address space anyway, since your pointers aren't valid in the other process's address space. This is inefficient.

And the communication facilities are an extra cognitive burden.

Worse, when processes communicate through shared storage, you have to make sure you don't forget to deallocate the shared storage when you're done. Often, not even reference-counting is done by the OS.

Multilinguality

C has a problem; it's hard to write high-level programs in it. It is emphatically not a scripting language. (Of the languages I said I would refer to as C, Forth is not like this!) So you implement the lower levels of things in C, and make up new ad-hoc languages for the higher-level stuff.

In Lisp, you don't need to do this.

Instability

Despite the heroic efforts described above, OSes written in C remain unstable in the presence of new kernel code. And because of the context-switch penalty, there's a strong incentive to do things in the kernel for efficiency.


Did I miss something important? Did I get something wrong? Email me.