Thoughts on Arc at 3 Weeks --- Kragen Sitaker

I really like Paul Graham's "Arc at 3 Weeks"; I think the language is extremely promising. I have some thoughts. Some of them are good ideas, some are me finding mistakes Graham has made, and some of them are wrong. I expect that Graham and I will have different ideas about which are which, so I have left my opinions on that matter out of the text below.

fn

Since setf plays such a prominent role in Arc, there should be a version of fn that allows you to define a 'setter' function as well as a getter.

Compound = Functions on Indices

I wish every language used the same form for function call, hash fetches, and array indexing, the way arc does. It's a brilliant idea.

In order to be able to write new compound data types in Arc, there needs to be a way to create values that can be called as functions and also implement other operations. This can be done in a straightforward way with the overriding system described in "Arc at 3 Weeks": overload apply. (Apply should probably be called "call", if we're rebuilding Lisp from the ground up.)

Pronouns

'it' being bound by iteration and conditionals is also a brilliant idea. Python is working toward getting something similar, but nobody's done it yet. FORTH has sort of had it with DO and I and J for a while, but not in conditionals. (One might argue that FORTH is almost all pronouns; very little data is named.) And, of course, Perl is full of pronouns: <>, $_, the current filehandle, etc.

Shouldn't there be a form of 'each' and 'to' that bind a pronoun, too, as in Perl and FORTH?

    (each '(a b c) (pr item))
    (to 10 (= (ary i) (* i i)))

From the examples, it looks like 'keep' gives a way to write a list comprehension, and 'sum' gives a way to write foldl (+) (APL +/) over a list comprehension; except, of course, you can write "list comprehensions" that iterate over non-list-like things. Some other languages (Python, for example) are handling this by making almost anything you can iterate over look like a list.

DBs

The proposed semantics for fetching from a DB are that nonexistent keys return the current value of the global variable *fail*, which defaults to nil. This is wrong, for two reasons:

If mydb is a DB that doesn't have any false values in it, and nil is false, this code will almost always work, but it's broken:
```
      (if (mydb foo) (x it) (y))
```
because it assumes that *fail* is set to something in particular. A correct variant of this code (assuming = can set a global variable) is
```
      (do (= *fail* nil) (if (mydb foo) (x it) (y)))
```
To write correct code, you'll almost always have to set *fail* like this before a test for existence, because you must ensure it's set to something you know can't legitimately be in the DB.

Making correct code be much more verbose than incorrect code that almost always works, but breaks when a completely unrelated part of the program changes, is probably a bad idea. (I know Graham is designing a language for good programmers, but I think that's going a little too far.)
there's a much weaker reason, which is that *fail* might actually be the value of an item in a DB --- for example, a DB that contained the program's global variables.

I like Python's approach to this problem. Fetching a nonexistent value in the normal way will raise an exception, which is usually the right thing. So you could write the above code as

    (try (x (mydb foo)) 'KeyError (y))

But there is also an operation 'get', which returns a specified default value if the requested key doesn't exist, and is much briefer than the corresponding operation with *fail*; compare:

    (do (= *fail* "") (mydb foo))
    (get mydb foo "")

or even

    (mydb foo "")

and an operation 'has_key', which allows the operation to be written as:

    (if (has_key mydb foo) (x (mydb foo)) (y))

although this is nearly as verbose as the correct version with *fail*.

Another approach: The two-argument variant could actually be a macro (since macros are first-class objects, your DB can be a macro) which evaluates and returns its second argument only if the lookup fails. This almost works to shorten the example above to (x (mydb foo (y))), but that calls x even if the lookup fails.

A third approach: has_key is an instance of a wider concept of 'exists'; in Perl, 'exists' applies only to hash lookups, but it applies to a wide variety of operations, essentially everything it makes sense to call setf on.

    (exists var)
    (exists (sqrt -1))
    (exists (mydb 'bob))
    (exists (car list-or-nil))

Python has a fourth generic operation as well: deletion. You might want to check to see whether (foo 1) exists, get its value, set its value, or delete it, regardless of whether foo is a vector, an associative array, or even possibly something else altogether.

A fourth approach: in Icon (and, sort of, in Prolog, and, in another way, sort of, in Lisp), expressions can return any number of values, including none; no value returned is boolean false. If a dictionary lookup returns no values if it fails, then

    (if (mydb foo) (x it) (y))

can be correct, because 'no value' is distinct from any particular value, even nil, just as the empty string is distinct from any particular character, even NUL.

Lisp's assoc family takes another approach: return something containing the correct value, not the value itself. I don't like this; although pattern-matching could make it less painful to use, I don't think it could provide any better syntax than the 'try' approach above.

The default of being indexed by 'eq' is not very good for string processing, unless you use string->symbol a lot --- which might be a good idea for string processing, but will certainly make your code more verbose.

If you want the option of using arbitrary equality operators for DBs, and you want some DBs to be hash tables, there needs to be a way of associating a hash with an equality operator.

assignment

The current proposal for the language has (= var form) either declare a new lexical variable valid until the end of the current block, or change the value of an existing variable.

I really prefer languages that have different forms for declaring a local variable (with an initial value) and changing an existing variable; this is one of Python's big deficiencies. I care for the following three reasons:

I can see immediately whether a piece of code uses assignment; I avoid assignment in some parts of my code for testability
the difference between modifying variables declared in a larger scope and shadowing them with local variables with the same names is obvious, and there's an obvious way to do both
misspelled variable names get detected instead of silently producing incorrect results (or occasional run-time errors)

I do like the way C++ lets you declare a variable in the scope "the rest of the block", and I think the Scheme/Lisp/C way leads to unreadably-indented code or variables being declared too far from their use.

Syntax

There is another argument for having syntax, other than that it makes programs shorter: it can make programs more readable (in the sense that source code generally is more readable than bytecode, or mathematical formulas are generally more readable than English sentences describing the same formula). In other words, it makes the language more accessible to anyone, good programmers included. I don't think of decompiling bytecode as "dumbing down".

Unicode offers a plethora of punctuation; perhaps tasteful and restrained use of some of this punctuation could make the syntax more readable than any possible ASCII syntax. For example, I'd be inclined to write infix bitwise Boolean operators with real Boolean operators instead of ASCII stand-ins, and I'd be inclined to declare variables with some conspicuous piece of punctuation. (In ASCII, I'd probably prefer := if it didn't already have so many closely-related meanings in other languages (Pascal, GNU make).)

Implicit progn

Graham is right on the mark here; eliminating implicit progn means we can eliminate many of the uglier and more verbose pieces of Lisp syntax.

(Implicit progn seems to have crept into the 'to' syntax, though; one of Graham's examples is (to x 5 (sum x) (pr x)).)

Pattern matching

Destructuring arguments are very good. I'd like to have full ML-style pattern matching, though; it's possible to write that as a macro, but I'd like it to be part of the language.

Recursion on strings

Recursing on strings can be very efficient if the strings are allowed to share structure the way lists do and your compiler is smart enough to reuse storage that can be statically proven garbage. (Or even if it can allocate it on the stack.)

Canonical symbol case

The "Arc at 3 Weeks" paper's examples show symbols being canonicalized into upper case. I don't like this; upper-case is hard to read. I assume it's an artifact of the first Arc implementation running in an existing Common Lisp system.

Unicode makes canonical case impractical, anyway, because case-mapping is very complicated.

Classes

The proposed overloading semantics won't give an extensible numeric system.

My tastes in object systems appear to be markedly different from Graham's; mine are largely founded in experience with Python. So the object system I want is probably not something he wants:

Classes, like compounds, should be called to instantiate objects; this makes the calling code shorter, and also allows you to turn classes into factory functions and vice versa without changing the calling code.

There doesn't seem to be a provision for a constructor or methods (other than overloads of existing functions). There probably should be.

Class instances should not be callable unless they overload apply, for the following reasons:

there's no need to have one syntax for functions the objects overload and another syntax for getting attributes of the objects. After I say (= pt (class nil 'x 0 'y 0)) (= p1 (pt)), I should be able to ask for (x pt). This does have the disadvantage that "class" must somehow bind x and y in my calling environment if they aren't already bound; I think that's easily accomplished with a macro. (This also removes the necessity for quoting x and y.) (Hmm, what if x is bound to nil? Should we then make nil callable? Perhaps we should just raise an error if x is bound in the local environment, and require that it be quoted or otherwise marked when it's defined as a method in this manner. Quoted attributes would define functions in your local environment which sent doesNotUnderstand to the object passed as their argument, and then overload those functions for this particular object type.)
the current example (p1 'x) prevents p1 from being able to act like some other kind of compound.

Most dynamic object systems let you get an object of a class you've never heard of and invoke the correct methods on it. Doing this with the object system described above requires that you have some variable bound to the method "x" defined in "pt" above. I thought this was a problem at first, but I don't think it is now; presumably if your code was written without knowledge of that "x", it won't mention "x" (at least, not meaning that method), and if it was written with knowledge of that "x", it's presumably because there's an interface somewhere that defines "x" to mean something. That interface can be defined in some module somewhere that both "pt" and my calling code import, in a form like this:

    (defmethods 'x 'y)

This has the advantage, not shared by most dynamic object systems, that we can have many methods named "x" without conflict, and we can be sure that the "x" we're calling is intended to implement the interface we expect it to, not some other interface that has a method called "x".

Like others, I haven't yet found a compelling use for multiple inheritance, and there are compelling reasons against it.

No doc strings

No elisp/Python-style documentation strings are mentioned. I want these.