#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""Caronte: uma pequena lua.

This is part of a series of simple exercises I’ve been doing in
hacking programming-language prototypes together from scratch in one
day.  After 4½ hours this successfully generates a fractal and I am
going to bed happy.  It compiles a tiny infix language into a
Forth-like sequence of stack operations (but with a JS-like memory
model), which it then interprets.

In this case the language is sort of a minimal version of JS or Lua.
Caronte is the Portuguese name for Charon, the 1205-km-diameter moon
of Pluto, about a fiftieth of the mass of Earth’s Moon (a lua).

I started by letting the design get fairly out of control with
Caronte-grande:

Caronte-grande
--------------

In addition to some basic arithmetic, the Caronte syntax supports
dicts, indexing, conditionals, assignments, sequencing, looping,
strings, and function definitions.  It’s an expression-oriented
language like ML; ``;`` is an infix sequencing operator on expressions
which returns the rightmost result, like ``,`` in C or JS.

The function-definition syntax is a restricted form of the new JS
arrow syntax: (a, b, c) => expr, where the literal parens are required
but of course the argument list can be of any length.  Because ``;``
is an expression, you can parenthesize a function body with a sequence
of statements.  “=>” associates to the right so you can write “k = (a)
=> (b) => a”; since its left operand isn’t an expression it wouldn’t
make sense for it to associate to the left.

For conditionals I’m using the ``a ? b : c`` syntax from C, because I
liike punctuation.  Loops use an infix ``|`` operator which has two
variants, a foreach variant and a while variant.  The foreach variant
is ``c <- cs | e``, which evaluates ``e`` repeatedly with local
bindings for ``c`` taken from the sequence ``cs``.  The while variant
is ``p | q``, which alternates between evaluating ``p`` and, if it was
true, re-evaluating ``q``.  ``|`` has higher precedence than ``;`` and
the boolean and comparison operators, and it associates to the right
so that it’s flat to write ``a <- as | b <- bs | c``, and ``x | y |
z`` is ``while (x) while (y) z``, as you’d expect, rather than ``while
(while (x) y) z``.

Loops evaluate to sequences, which are materialized in memory and
indexed with numbers.

The dict literal syntax is more like JS’s than Lua’s: { name1: expr1,
name2: expr2, (expr3): expr4 }.  Dict indexing can be done with names
using "." or with expressions using "[]".

Variable declaration is done with an "=" assignment; duplicate
declarations in the same scope are a compile-time error.  Mutation is
achieved with the ":=" mutation operator.  Mutating undeclared
variables is also an error, at compile-time; mutating nonexistent dict
entries is an error at run-time.  Adding new dict entries is done with
the "=" operator, like variable declaration, and is a run-time error
if the entries already exist.

There’s a prefix “!!!” operator from Wheat to return an error value,
which is the only kind of value the “;” operator doesn’t ignore on its
left argument.  “!!” is an error-handling operator; “a !! b” is a,
unless a evaluated to an error, in which case it’s b.  There’s an
unary prefix "!" operator which returns nil when applied to a
non-error value, or the error argument "x" when applied to an error
value "!!!x".  Error values cannot be stored in dictionaries; an
attempt to do so yields the error value from the assignment and leaves
the dictionary unchanged.

Comparisons and boolean operations have “syntax error” associativity.
Instead of parsing ``a > b < c`` as either "a > (b < c)", "(a > b) <
c", or "a > b and b < c", it’s just a parse error.  Similarly "a and b
or c" is a syntax error, although you can chain "and" (and "abj",
which is "and not") and "or" as you like.  I’m tempted to do the same
thing with "*" and "/".

Bitwise operations are provided by functions rather than operators.

This syntax is brutally infix; the precedence order is something like

    ;
    := =
    !!
    =>
    or and abj
    == < > <= >= !=
    |
    ?:
    + - (binary)
    * / %
    - (unary)  ! !!! not
    **
    . [] (function call)
    parenthesization

Here’s a whack at some pseudo-BNF syntax:

    program: _ expr
    _: (" " | "\n" | "\t")*
    expr: e10 (";"_ e10)*
    e10: lvalue (":=" | "=")_ e10 | e20
    e20: e30 ("!!"_ e30)*
    e30: formals "=>"_ e30 | e33
    e33: e36 (("and"  | "abj")_ e36)* | e36 ("or"_ e36)+
    e36: e40 ("==" | "<" | ">" | "<=" | ">=" | "!=")_ e40 | e40
    e40: lvalue "<-"_ e60 "|"_ e40 | e60 "|"_ e40 | e60
    e60: e70 "?"_ e70 ":" e60 | e70
    e70: e80 (("+" | "-")_ e80)*
    e80: e90 (("*" | "/" | "%")_ e90)*
    e90: (("-" | "!" | "!!!" )_)* e100
    e100: e110 ("**"_ e110)*
    e110: e120 ("."_ id | "["_ expr "]"_ | actuals)*
    e120: "("_ expr ")"_ | id | const | dict
    lvalue: id ("."_ id | "["_ expr "]"_)*
    formals: "("_ ")"_ | "("_ id (","_ id)* ")"_
    actuals: "("_ ")"_ | "("_ e20 (","_ e20)* ")"_
    const: string | integer | float | "nil"_
    dict: "{"_ "}"_ | "{"_ pair (","_ pair)* "}"_
    pair: id ":"_ e20 | "("_ e20 ")"_ ":"_ e20

Maybe loops and conditionals should be incomparable in the same way as
and and or?

I feel a little bit bad to require two separate tokens ")" "=>" in
between arguments and body, and leaving the end of the function body
undelimited seems potentially error-prone.  I wonder if some kind of
bra-ket notation would be an improvement: ``{a b : 10 * a + b}`` or
something.

Caronte-pequeno
---------------

Okay, well, the above has clearly gone a bit out of control for a
one-night hack and is kind of overwhelming.  Can I scale it down to
something tasteful and enjoyable?  Because it took me an hour and a
half to write the *textual* description of Caronte-grande and I’m
getting sleepy!

How about revisiting my Sierpinski thing from the other week?

"""
from __future__ import division, print_function
import re


sier = """
    c = {};
    x <- iota(128) | (c[x] = 0);
    c[0] := 1;
    y <- iota(128) | (
        x <- iota(127) | (c[x+1] := (c[x] + c[x+1] == 1) ? 1 : 0);
        x <- iota(128) | say((c[x] == 0) ? " " : "#");
        say("\n")
    )
"""

"""That ought to need only this subset of the grammar:

    program: _ expr
    _: (" " | "\n" | "\t")*
    expr: e10 (";"_ e10)*
    e10: lvalue (":=" | "=")_ e10 | e36
    e36: e40 "=="_ e40 | e40
    e40: lvalue "<-"_ e60 "|"_ e40 | e60
    e60: e70 "?"_ e70 ":" e60 | e70
    e70: e110 ("+"_ e110)*
    e110: e120 ("["_ expr "]"_ | actuals)*
    e120: "("_ expr ")"_ | id | const | dict
    lvalue: id ("["_ expr "]"_)*
    actuals: "("_ e36 (","_ e36)* ")"_
    const: string | integer
    dict: "{"_ "}"_

I’m going to try a parsing approach inspired by Darius Bacon’s
min_parson.py:
https://gist.github.com/darius/7d5899523cc8fd4a64870dea601b77fb

Parsers are functions invoked with the remaining tail of the input,
which return an empty list on failure or a list containing one tuple
[(results, remaining)] on success.

"""

"""
One version of the example program was:

    c = {};
    x <- iota(128) | (c[x] = 0);
    c[0] := 1;
    y <- iota(128) | (
        x <- iota(127) | (c[x+1] := c[x] + c[x+1] == 1 ? 1 : 0);
        x <- iota(128) | say(c[x] == 0 ? " " : "#");
        say("\n")
    )

At present this gets 'compiled' to:

[(('c', {}, init, ;,
'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;,
'c', @, int, '0', at, int, '1', put, ;,
'y', 'iota', @, int, '128', arg, call, each,
    'x', 'iota', @, int, '127', arg, call, each,
        'c', @, 'x', @, int, '1', +, at,
            'c', @, 'x', @, at, @,  'c', @, 'x', @, int, '1', +, at, @,  +,
            int, '1', if, int, '1', else, int, '0', then, ==, put, done, ;,
    'x', 'iota', @, int, '128', arg, call, each,
        'say', @,
            'c', @, 'x', @, at, @, int, '0', if, string, ' ', else, string, '#', then, ==,
        arg, call, done, ;,
    'say', @, string, '\n', arg, call,
done), '')]

Aside from the undesirable parentheses in the above, this is wrong;
it’s inadvertently parsing as ...c[x+1] == (1 ? 1 : 0).  To fix this I
would need to do some sort of ("?"_ e70 ":" e70)* sort of thing, I
think, which is pretty close to how I think about it when I’m writing
it.

For the time being, though, I can hack this with more parens, like the
undesirable parens needed for loop bodies with assignments, and this
gives me the correct result:

[(('c', {}, init, ;,
'x', 'iota', @, int, '128', arg, call, each, 'c', @, 'x', @, at, int, '0', init, done, ;,
'c', @, int, '0', at, int, '1', put, ;,
'y', 'iota', @, int, '128', arg, call, each,
    'x', 'iota', @, int, '127', arg, call, each,
        'c', @, 'x', @, int, '1', +, at,
            'c', @, 'x', @, at, @,  'c', @, 'x', @, int, '1',  +, at, @, +,
            int, '1', ==,
            if, int, '1', else, int, '0', then,
        put, done, ;,
    'x', 'iota', @, int, '128', arg, call, each,
        'say', @,
            'c', @, 'x', @, at, @, int, '0', ==, if, string, ' ', else, string, '#', then,
        arg, call, done, ;,
    'say', @, string, '\n', arg, call,
done), '')]

"""

def match(regexp):
    "Parse with an RE."
    regexp = re.compile(regexp)
    def f_re(s):
        mo = regexp.match(s)
        if not mo:
            return []
        return [(mo.groups(), s[mo.end():])]
    return f_re

def eta(thunk):
    """Permit circular and forward references by η-expansion.

    What we want is for eta(lambda: x)(y) to return x(y), but wait to
    evaluate x until necessary.
    """
    return lambda *args: thunk()(*args)

def lit(t):
    "Match a literal string."
    return lambda s: [((), s[len(t):])] if s.startswith(t) else []

assert lit("foo")("foobar") == [((), "bar")]
assert lit("foo")("oobar") == []

def seq(p, q):
    "Match p, then q; language concatenation; Kleene algebra product."
    def f(s):
        for pr, ps in p(s):
            for qr, qs in q(ps):
                return [(pr + qr, qs)]
        return []
    return f

assert seq(lit("fo"), lit("ob"))("foobar") == [((), "ar")]
assert seq(match("(f)"), match("o*(ba)"))("foobar") == [(("f", "ba"), "r")]
assert seq(match("(f)"), match("o*(ba)"))("oobar") == []
assert seq(match("(f)"), match("o*(ba)"))("fooar") == []

def seqs(*ps):
    "Match a sequence of items."
    rv = lit("")
    for p in ps:
        rv = seq(rv, p)
    return rv

assert seqs(match(r"(\w+)\s*"),
            match(r"(\d+)"),
            match(r"(x*)"))("the 20xxy x") == [(('the', '20', 'xx'), 'y x')]

def alt(p, q):
    "PEG ordered choice; language union; roughly Kleene algebra sum."
    def f(s):
        for item in p(s):
            return [item]
        return q(s)
    return f

fail = lambda s: []
def alts(*ps):
    "N-way ordered choice."
    rv = fail
    for p in ps:
        rv = alt(rv, p)
    return rv

assert alts(match("(x)"), match("(y)"))("xes") == [(("x",), "es")]
assert alts(match("(x)"), match("(y)"))("yes") == [(("y",), "es")]
assert alts(match("(x)"), match("(y)"))("zes") == []

def rep(element):
    "Kleene closure.  Zero or more.  Repetition."
    def f(s):
        results = ()
        while True:
            r = element(s)
            if not r:
                return [(results, s)]

            results += r[0][0]
            s = r[0][1]

    return f

def do(*k):
    "A parser that consumes no text but succeeds with tuple k."
    return lambda s: [(k, s)]

class Insn:
    def __init__(self, name): self.name = name
    def __repr__(self): return self.name
    def __eq__(self, other): return isinstance(other, Insn) and self.name == other.name
    def __hash__(self): return hash(self.name)

insn = lambda name: do(Insn(name))

__ = rep(alts(lit(" "), lit("\n"), lit("\t")))
lsp = lambda s: seq(lit(s), __)
dict_ = seqs(lsp("{"), lsp("}"), do({}))
string = seqs(insn("string"), match(r'"((?:[^\\"]|\\.)*)"\s*'))
integer = seq(insn('int'), match(r'(\d+)\s*'))
const = alt(string, integer)
actuals = seqs(lsp("("),
               eta(lambda: e36), insn('arg'),
               rep(seqs(lsp(","), eta(lambda: e36), insn('arg'))),
               lsp(")"),
               insn('call'))
id_ = match(r'([A-Za-z_]\w*)\s*')
lvalue = seq(id_, rep(seqs(insn('@'), lsp("["), eta(lambda: expr), lsp("]"), insn('at'))))
e120 = alts(seqs(lsp("("), eta(lambda: expr), lsp(")")),
            seq(id_, insn('@')), const, dict_)
e110 = seq(e120, rep(alt(seqs(lsp("["), eta(lambda: expr), lsp("]"), insn('at'), insn('@')),
                         actuals)))
e70 = seq(e110, rep(seqs(lsp("+"), e110, insn('+'))))
e60 = alt(seqs(e70, lsp("?"), insn('if'),
               e70, lsp(":"), insn('else'),
               eta(lambda: e60), insn('then')),
          e70)
e40 = alts(seqs(lvalue, lsp("<-"), e60,
                lsp("|"), insn('each'), eta(lambda: e40), insn('done')),
           e60)
e36 = alts(seqs(e40, lsp("=="), e40, insn('==')), e40)
e10 = alts(seqs(lvalue, lsp(":="), eta(lambda: e10), insn('put')),
           seqs(lvalue, lsp("="), eta(lambda: e10), insn('init')),
           e36)
expr = seqs(e10, rep(seqs(lsp(";"), insn(";"), e10)))
program = seq(__, eta(lambda: expr))
parse = program

def run(prog):
    # Scan program to set up jump targets
    jump_targets = {}
    cstack = []
    each, done, if_, else_, then = [Insn(n) for n in 'each done if else then'.split()]

    for i, c in enumerate(prog):
        if c in [each, if_]:
            cstack.append((c, i))
        elif c == done:
            a, b = cstack.pop()
            assert a == each
            jump_targets[i] = b + 1
            jump_targets[b] = i
        elif c == else_:
            a, b = cstack.pop()
            assert a == if_
            jump_targets[b] = i + 1
            cstack.append((c, i))
        elif c == then:
            a, b = cstack.pop()
            assert a == else_
            jump_targets[b] = i

    assert not cstack

    pc = 0
    ram = {'iota': lambda n: list(range(n)), 'say': lambda v: print(*v, end='')}
    stack = []
    args = []
    loops = []

    def store(loc, val):
        if len(loc) == 1:
            ram[loc[0]] = val
        else:
            assert len(loc) == 2
            loc[0][loc[1]] = val

    while pc < len(prog):
        c = prog[pc]
        if isinstance(c, str):
            stack.append((c,))  # locations are tuples
        elif c == {}:
            stack.append({})
        elif c in [Insn('init'), Insn('put')]:
            val = stack.pop()
            loc = stack.pop()
            store(loc, val)
        elif c == Insn('@'):
            loc = stack.pop()
            if len(loc) == 1:
                stack.append(ram[loc[0]])
            else:
                assert len(loc) == 2
                stack.append(loc[0][loc[1]])
        elif c in [Insn(';'), then]:
            pass
        elif c == Insn('int'):
            pc += 1
            stack.append(int(prog[pc]))
        elif c == Insn('arg'):
            args.append(stack.pop())
        elif c == Insn('call'):
            f = stack.pop()
            stack.append(f(*args))
            args = []
        elif c == each:
            seq = stack.pop()
            loc = stack.pop()
            loops.append([loc, seq, 0])
            pc = jump_targets[pc]
            continue
        elif c == done:
            loc, seq, idx = loops[-1]
            if idx >= len(seq):
                loops.pop()
                pc += 1
                continue
            store(loc, seq[idx])
            loops[-1][-1] += 1
            pc = jump_targets[pc]
            continue
        elif c == Insn('at'):
            idx = stack.pop()
            loc = stack.pop()
            stack.append((loc, idx))
        elif c == Insn('+'):
            x = stack.pop()
            stack.append(stack.pop() + x)
        elif c == Insn('=='):
            x = stack.pop()
            stack.append('true' if stack.pop() == x else 'false')
        elif c == if_:
            x = stack.pop()
            assert x in {'true', 'false'}
            if x == 'false':
                pc = jump_targets[pc]
                continue
        elif c == else_:
            pc = jump_targets[pc]
            continue

        pc += 1

if __name__ == '__main__':
    import cgitb; cgitb.enable(format='text')
    prog = parse(sier)
    print(prog)
    run(prog[0][0])