#!/usr/bin/python3 # -*- coding: utf-8 -*- """Caronte: uma pequena lua. This is part of a series of simple exercises I’ve been doing in hacking programming-language prototypes together from scratch in one day. After 4½ hours this successfully generates a fractal and I am going to bed happy. It compiles a tiny infix language into a Forth-like sequence of stack operations (but with a JS-like memory model), which it then interprets. In this case the language is sort of a minimal version of JS or Lua. Caronte is the Portuguese name for Charon, the 1205-km-diameter moon of Pluto, about a fiftieth of the mass of Earth’s Moon (a lua). I started by letting the design get fairly out of control with Caronte-grande: Caronte-grande -------------- In addition to some basic arithmetic, the Caronte syntax supports dicts, indexing, conditionals, assignments, sequencing, looping, strings, and function definitions. It’s an expression-oriented language like ML; ``;`` is an infix sequencing operator on expressions which returns the rightmost result, like ``,`` in C or JS. The function-definition syntax is a restricted form of the new JS arrow syntax: (a, b, c) => expr, where the literal parens are required but of course the argument list can be of any length. Because ``;`` is an expression, you can parenthesize a function body with a sequence of statements. “=>” associates to the right so you can write “k = (a) => (b) => a”; since its left operand isn’t an expression it wouldn’t make sense for it to associate to the left. For conditionals I’m using the ``a ? b : c`` syntax from C, because I liike punctuation. Loops use an infix ``|`` operator which has two variants, a foreach variant and a while variant. The foreach variant is ``c <- cs | e``, which evaluates ``e`` repeatedly with local bindings for ``c`` taken from the sequence ``cs``. The while variant is ``p | q``, which alternates between evaluating ``p`` and, if it was true, re-evaluating ``q``. ``|`` has higher precedence than ``;`` and the boolean and comparison operators, and it associates to the right so that it’s flat to write ``a <- as | b <- bs | c``, and ``x | y | z`` is ``while (x) while (y) z``, as you’d expect, rather than ``while (while (x) y) z``. Loops evaluate to sequences, which are materialized in memory and indexed with numbers. The dict literal syntax is more like JS’s than Lua’s: { name1: expr1, name2: expr2, (expr3): expr4 }. Dict indexing can be done with names using "." or with expressions using "[]". Variable declaration is done with an "=" assignment; duplicate declarations in the same scope are a compile-time error. Mutation is achieved with the ":=" mutation operator. Mutating undeclared variables is also an error, at compile-time; mutating nonexistent dict entries is an error at run-time. Adding new dict entries is done with the "=" operator, like variable declaration, and is a run-time error if the entries already exist. There’s a prefix “!!!” operator from Wheat to return an error value, which is the only kind of value the “;” operator doesn’t ignore on its left argument. “!!” is an error-handling operator; “a !! b” is a, unless a evaluated to an error, in which case it’s b. There’s an unary prefix "!" operator which returns nil when applied to a non-error value, or the error argument "x" when applied to an error value "!!!x". Error values cannot be stored in dictionaries; an attempt to do so yields the error value from the assignment and leaves the dictionary unchanged. Comparisons and boolean operations have “syntax error” associativity. Instead of parsing ``a > b < c`` as either "a > (b < c)", "(a > b) < c", or "a > b and b < c", it’s just a parse error. Similarly "a and b or c" is a syntax error, although you can chain "and" (and "abj", which is "and not") and "or" as you like. I’m tempted to do the same thing with "*" and "/". Bitwise operations are provided by functions rather than operators. This syntax is brutally infix; the precedence order is something like ; := = !! => or and abj == < > <= >= != | ?: + - (binary) * / % - (unary) ! !!! not ** . [] (function call) parenthesization Here’s a whack at some pseudo-BNF syntax: program: _ expr _: (" " | "\n" | "\t")* expr: e10 (";"_ e10)* e10: lvalue (":=" | "=")_ e10 | e20 e20: e30 ("!!"_ e30)* e30: formals "=>"_ e30 | e33 e33: e36 (("and" | "abj")_ e36)* | e36 ("or"_ e36)+ e36: e40 ("==" | "<" | ">" | "<=" | ">=" | "!=")_ e40 | e40 e40: lvalue "<-"_ e60 "|"_ e40 | e60 "|"_ e40 | e60 e60: e70 "?"_ e70 ":" e60 | e70 e70: e80 (("+" | "-")_ e80)* e80: e90 (("*" | "/" | "%")_ e90)* e90: (("-" | "!" | "!!!" )_)* e100 e100: e110 ("**"_ e110)* e110: e120 ("."_ id | "["_ expr "]"_ | actuals)* e120: "("_ expr ")"_ | id | const | dict lvalue: id ("."_ id | "["_ expr "]"_)* formals: "("_ ")"_ | "("_ id (","_ id)* ")"_ actuals: "("_ ")"_ | "("_ e20 (","_ e20)* ")"_ const: string | integer | float | "nil"_ dict: "{"_ "}"_ | "{"_ pair (","_ pair)* "}"_ pair: id ":"_ e20 | "("_ e20 ")"_ ":"_ e20 Maybe loops and conditionals should be incomparable in the same way as and and or? I feel a little bit bad to require two separate tokens ")" "=>" in between arguments and body, and leaving the end of the function body undelimited seems potentially error-prone. I wonder if some kind of bra-ket notation would be an improvement: ``{a b : 10 * a + b}`` or something. Caronte-pequeno --------------- Okay, well, the above has clearly gone a bit out of control for a one-night hack and is kind of overwhelming. Can I scale it down to something tasteful and enjoyable? Because it took me an hour and a half to write the *textual* description of Caronte-grande and I’m getting sleepy! How about revisiting my Sierpinski thing from the other week? """ from __future__ import division, print_function import re sier = """ c = {}; x <- iota(128) | (c[x] = 0); c[0] := 1; y <- iota(128) | ( x <- iota(127) | (c[x+1] := (c[x] + c[x+1] == 1) ? 1 : 0); x <- iota(128) | say((c[x] == 0) ? " " : "#"); say("\n") ) """ """That ought to need only this subset of the grammar: program: _ expr _: (" " | "\n" | "\t")* expr: e10 (";"_ e10)* e10: lvalue (":=" | "=")_ e10 | e36 e36: e40 "=="_ e40 | e40 e40: lvalue "<-"_ e60 "|"_ e40 | e60 e60: e70 "?"_ e70 ":" e60 | e70 e70: e110 ("+"_ e110)* e110: e120 ("["_ expr "]"_ | actuals)* e120: "("_ expr ")"_ | id | const | dict lvalue: id ("["_ expr "]"_)* actuals: "("_ e36 (","_ e36)* ")"_ const: string | integer dict: "{"_ "}"_ I’m going to try a parsing approach inspired by Darius Bacon’s min_parson.py: https://gist.github.com/darius/7d5899523cc8fd4a64870dea601b77fb Parsers are functions invoked with the remaining tail of the input, which return an empty list on failure or a list containing one tuple [(results, remaining)] on success. """ """ One version of the example program was: c = {}; x <- iota(128) | (c[x] = 0); c[0] := 1; y <- iota(128) | ( x <- iota(127) | (c[x+1] := c[x] + c[x+1] == 1 ? 1 : 0); x <- iota(128) | say(c[x] == 0 ? " " : "#"); say("\n") ) At present this gets 'compiled' to: [(('c', {}, init, ;, 'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;, 'c', @, int, '0', at, int, '1', put, ;, 'y', 'iota', @, int, '128', arg, call, each, 'x', 'iota', @, int, '127', arg, call, each, 'c', @, 'x', @, int, '1', +, at, 'c', @, 'x', @, at, @, 'c', @, 'x', @, int, '1', +, at, @, +, int, '1', if, int, '1', else, int, '0', then, ==, put, done, ;, 'x', 'iota', @, int, '128', arg, call, each, 'say', @, 'c', @, 'x', @, at, @, int, '0', if, string, ' ', else, string, '#', then, ==, arg, call, done, ;, 'say', @, string, '\n', arg, call, done), '')] Aside from the undesirable parentheses in the above, this is wrong; it’s inadvertently parsing as ...c[x+1] == (1 ? 1 : 0). To fix this I would need to do some sort of ("?"_ e70 ":" e70)* sort of thing, I think, which is pretty close to how I think about it when I’m writing it. For the time being, though, I can hack this with more parens, like the undesirable parens needed for loop bodies with assignments, and this gives me the correct result: [(('c', {}, init, ;, 'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;, 'c', @, int, '0', at, int, '1', put, ;, 'y', 'iota', @, int, '128', arg, call, each, 'x', 'iota', @, int, '127', arg, call, each, 'c', @, 'x', @, int, '1', +, at, 'c', @, 'x', @, at, @, 'c', @, 'x', @, int, '1', +, at, @, +, int, '1', ==, if, int, '1', else, int, '0', then, put, done, ;, 'x', 'iota', @, int, '128', arg, call, each, 'say', @, 'c', @, 'x', @, at, @, int, '0', ==, if, string, ' ', else, string, '#', then, arg, call, done, ;, 'say', @, string, '\n', arg, call, done), '')] """ def match(regexp): "Parse with an RE." regexp = re.compile(regexp) def f_re(s): mo = regexp.match(s) if not mo: return [] return [(mo.groups(), s[mo.end():])] return f_re def eta(thunk): """Permit circular and forward references by η-expansion. What we want is for eta(lambda: x)(y) to return x(y), but wait to evaluate x until necessary. """ return lambda *args: thunk()(*args) def lit(t): "Match a literal string." return lambda s: [((), s[len(t):])] if s.startswith(t) else [] assert lit("foo")("foobar") == [((), "bar")] assert lit("foo")("oobar") == [] def seq(p, q): "Match p, then q; language concatenation; Kleene algebra product." def f(s): for pr, ps in p(s): for qr, qs in q(ps): return [(pr + qr, qs)] return [] return f assert seq(lit("fo"), lit("ob"))("foobar") == [((), "ar")] assert seq(match("(f)"), match("o*(ba)"))("foobar") == [(("f", "ba"), "r")] assert seq(match("(f)"), match("o*(ba)"))("oobar") == [] assert seq(match("(f)"), match("o*(ba)"))("fooar") == [] def seqs(*ps): "Match a sequence of items." rv = lit("") for p in ps: rv = seq(rv, p) return rv assert seqs(match(r"(\w+)\s*"), match(r"(\d+)"), match(r"(x*)"))("the 20xxy x") == [(('the', '20', 'xx'), 'y x')] def alt(p, q): "PEG ordered choice; language union; roughly Kleene algebra sum." def f(s): for item in p(s): return [item] return q(s) return f fail = lambda s: [] def alts(*ps): "N-way ordered choice." rv = fail for p in ps: rv = alt(rv, p) return rv assert alts(match("(x)"), match("(y)"))("xes") == [(("x",), "es")] assert alts(match("(x)"), match("(y)"))("yes") == [(("y",), "es")] assert alts(match("(x)"), match("(y)"))("zes") == [] def rep(element): "Kleene closure. Zero or more. Repetition." def f(s): results = () while True: r = element(s) if not r: return [(results, s)] results += r[0][0] s = r[0][1] return f def do(*k): "A parser that consumes no text but succeeds with tuple k." return lambda s: [(k, s)] class Insn: def __init__(self, name): self.name = name def __repr__(self): return self.name def __eq__(self, other): return isinstance(other, Insn) and self.name == other.name def __hash__(self): return hash(self.name) insn = lambda name: do(Insn(name)) __ = rep(alts(lit(" "), lit("\n"), lit("\t"))) lsp = lambda s: seq(lit(s), __) dict_ = seqs(lsp("{"), lsp("}"), do({})) string = seqs(insn("string"), match(r'"((?:[^\\"]|\\.)*)"\s*')) integer = seq(insn('int'), match(r'(\d+)\s*')) const = alt(string, integer) actuals = seqs(lsp("("), eta(lambda: e36), insn('arg'), rep(seqs(lsp(","), eta(lambda: e36), insn('arg'))), lsp(")"), insn('call')) id_ = match(r'([A-Za-z_]\w*)\s*') lvalue = seq(id_, rep(seqs(insn('@'), lsp("["), eta(lambda: expr), lsp("]"), insn('at')))) e120 = alts(seqs(lsp("("), eta(lambda: expr), lsp(")")), seq(id_, insn('@')), const, dict_) e110 = seq(e120, rep(alt(seqs(lsp("["), eta(lambda: expr), lsp("]"), insn('at'), insn('@')), actuals))) e70 = seq(e110, rep(seqs(lsp("+"), e110, insn('+')))) e60 = alt(seqs(e70, lsp("?"), insn('if'), e70, lsp(":"), insn('else'), eta(lambda: e60), insn('then')), e70) e40 = alts(seqs(lvalue, lsp("<-"), e60, lsp("|"), insn('each'), eta(lambda: e40), insn('done')), e60) e36 = alts(seqs(e40, lsp("=="), e40, insn('==')), e40) e10 = alts(seqs(lvalue, lsp(":="), eta(lambda: e10), insn('put')), seqs(lvalue, lsp("="), eta(lambda: e10), insn('init')), e36) expr = seqs(e10, rep(seqs(lsp(";"), insn(";"), e10))) program = seq(__, eta(lambda: expr)) parse = program def run(prog): # Scan program to set up jump targets jump_targets = {} cstack = [] each, done, if_, else_, then = [Insn(n) for n in 'each done if else then'.split()] for i, c in enumerate(prog): if c in [each, if_]: cstack.append((c, i)) elif c == done: a, b = cstack.pop() assert a == each jump_targets[i] = b + 1 jump_targets[b] = i elif c == else_: a, b = cstack.pop() assert a == if_ jump_targets[b] = i + 1 cstack.append((c, i)) elif c == then: a, b = cstack.pop() assert a == else_ jump_targets[b] = i assert not cstack pc = 0 ram = {'iota': lambda n: list(range(n)), 'say': lambda v: print(*v, end='')} stack = [] args = [] loops = [] def store(loc, val): if len(loc) == 1: ram[loc[0]] = val else: assert len(loc) == 2 loc[0][loc[1]] = val while pc < len(prog): c = prog[pc] if isinstance(c, str): stack.append((c,)) # locations are tuples elif c == {}: stack.append({}) elif c in [Insn('init'), Insn('put')]: val = stack.pop() loc = stack.pop() store(loc, val) elif c == Insn('@'): loc = stack.pop() if len(loc) == 1: stack.append(ram[loc[0]]) else: assert len(loc) == 2 stack.append(loc[0][loc[1]]) elif c in [Insn(';'), then]: pass elif c == Insn('int'): pc += 1 stack.append(int(prog[pc])) elif c == Insn('arg'): args.append(stack.pop()) elif c == Insn('call'): f = stack.pop() stack.append(f(*args)) args = [] elif c == each: seq = stack.pop() loc = stack.pop() loops.append([loc, seq, 0]) pc = jump_targets[pc] continue elif c == done: loc, seq, idx = loops[-1] if idx >= len(seq): loops.pop() pc += 1 continue store(loc, seq[idx]) loops[-1][-1] += 1 pc = jump_targets[pc] continue elif c == Insn('at'): idx = stack.pop() loc = stack.pop() stack.append((loc, idx)) elif c == Insn('+'): x = stack.pop() stack.append(stack.pop() + x) elif c == Insn('=='): x = stack.pop() stack.append('true' if stack.pop() == x else 'false') elif c == if_: x = stack.pop() assert x in {'true', 'false'} if x == 'false': pc = jump_targets[pc] continue elif c == else_: pc = jump_targets[pc] continue pc += 1 if __name__ == '__main__': import cgitb; cgitb.enable(format='text') prog = parse(sier) print(prog) run(prog[0][0])