#!/usr/bin/python3
"""Quares: quadtree for rectangle coverage.

The objective here is to efficiently compute the area of a union of
many axis-aligned rectangles with integer coordinates, or maybe a
difference of multiple such regions.

Qanvas
======

There’s an imperative convenience interface, Qanvas::

    >>> q = quares.Qanvas(16)
    >>> q.fill(x=3, y=4, width=7, height=2, color=-2)
    >>> q.print()
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ///.......//////
    ///.......//////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> q.coverage(-1)
    242
    >>> q.coverage(-2)
    14

Subsequent rectangles drawn with .fill can replace previously drawn
pixels::

    >>> q.fill(5, 2, width=3, height=10, color=-3)
    >>> q.print()
    ////////////////
    ////////////////
    /////---////////
    /////---////////
    ///..---..//////
    ///..---..//////
    /////---////////
    /////---////////
    /////---////////
    /////---////////
    /////---////////
    /////---////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> q.coverage(-2)
    8
    >>> q.coverage(-3)
    30

Coverage of -2, drawn as '.', has decreased from 7×2 = 14 to 8.
There’s a fill_masked method to replace only a given target color; in
this case, we choose as our target the default background color, -1:

    >>> q.fill_masked(x=4, y=3, width=6, height=6, color=-5, target=-1)
    >>> q.print()
    ////////////////
    ////////////////
    /////---////////
    ////+---++//////
    ///..---..//////
    ///..---..//////
    ////+---++//////
    ////+---++//////
    ////+---++//////
    /////---////////
    /////---////////
    /////---////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> [q.coverage(color) for color in [-1, -2, -3, -4, -5]]
    [206, 8, 30, 0, 12]

You can snapshot the Qanvas state and restore the snapshot later.
This can provide undo and redo::

    >>> snapshot = q.snapshot()
    >>> q.fill(-5, -5, 10, 10, -5)
    >>> q.print()
    +++++///////////
    +++++///////////
    +++++---////////
    +++++---++//////
    +++++---..//////
    ///..---..//////
    ////+---++//////
    ////+---++//////
    ////+---++//////
    /////---////////
    /////---////////
    /////---////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> q.restore(snapshot)
    >>> q.print()
    ////////////////
    ////////////////
    /////---////////
    ////+---++//////
    ///..---..//////
    ///..---..//////
    ////+---++//////
    ////+---++//////
    ////+---++//////
    /////---////////
    /////---////////
    /////---////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////

Not yet implemented: to get the pixel data for other than debugging
purposes, there's a .tilemaq() method which takes a power-of-2
tile-size argument and returns a Tilemaq object.  This has two
methods: .map() to return a two-dimensional array of tile indices, and
.tile(index) to return a two-dimensional array of pixels for the given
tile index.  The pixels for a given tile index are guaranteed to never
change; new pixel patterns will be assigned new tile indices.

The idea is that if you have, say, a 1024×1024 qanvas that you’re
incrementally updating, you can divide it into, say, 32×32 tiles, 1024
of them in all.  You can use a slow scripting language like GDScript
to sling 1024 tiles up onto the display with no problem at all, as
long as you can call some fast native-code thing to actually put the
tile there.  And populating the 1024 pixels of a previously unknown
tile is also something you can reasonably do from GDScript.

Performance
===========

I tried building a pathologically bad case, a 2048×2048 canvas with
2048 rectangles on it forming a crisscross pattern::

    In [1]: import quares

    In [4]: q = quares.Qanvas(2048)

    In [5]: %%time
       ...: for i in range(0, 2048, 2):
       ...:     q.fill(i, 0, 1, 2048, -2)
       ...:     q.fill(0, i, 2048, 1, -3)
       ...: 
    CPU times: user 10.1 s, sys: 3.36 ms, total: 10.1 s
    Wall time: 10.1 s

This took an average of 5 milliseconds to add each of these
rectangles.  (But see below; this is from an old version of the code.)
The leftover parts of the background are in one million isolated
pixels, which we can visit in a quarter millisecond::

    In [6]: %%time
       ...: q.coverage(-1)
       ...: 
       ...: 
    CPU times: user 249 µs, sys: 2 µs, total: 251 µs
    Wall time: 268 µs
    Out[6]: 1048576

The remainder are split almost evenly between the horizontal and
vertical rectangle colors::

    In [7]: %%time
       ...: q.coverage(-2)
       ...: 
       ...: 
    CPU times: user 342 µs, sys: 0 ns, total: 342 µs
    Wall time: 356 µs
    Out[7]: 1572352

    In [8]: %%time
       ...: q.coverage(-3)
       ...: 
       ...: 
    CPU times: user 160 µs, sys: 0 ns, total: 160 µs
    Wall time: 171 µs
    Out[8]: 1573376

Taking a snapshot is instant.  We can also instantly fill the whole
grid with a new color::

    In [9]: s = q.snapshot()

    In [10]: %%time
        ...: q.fill(0, 0, 2048, 2048, -4)
        ...: 
        ...: 
    CPU times: user 14 µs, sys: 0 ns, total: 14 µs
    Wall time: 21.9 µs

    In [11]: %%time
        ...: q.coverage(-4)
        ...: 
        ...: 
    CPU times: user 15 µs, sys: 0 ns, total: 15 µs
    Wall time: 22.4 µs
    Out[11]: 4194304

Restoring the snapshot is also instant::

    In [12]: %%time
        ...: q.restore(s)
        ...: 
        ...: 
    CPU times: user 8 µs, sys: 0 ns, total: 8 µs
    Wall time: 13.8 µs

    In [13]: %%time
        ...: q.coverage(-4)
        ...: 
        ...: 
    CPU times: user 214 µs, sys: 2 µs, total: 216 µs
    Wall time: 223 µs
    Out[13]: 0

    In [14]: %%time
        ...: q.coverage(-2)
        ...: 
        ...: 
    CPU times: user 14 µs, sys: 0 ns, total: 14 µs
    Wall time: 19.6 µs
    Out[14]: 1572352

If instead of fill we use fill_masked, to create our million
background tiles, it takes 4590 milliseconds instead of 0.02
milliseconds::

    In [15]: %%time
        ...: q.fill_masked(0, 0, 2048, 2048, -4, target=-1)
        ...: 
        ...: 
    CPU times: user 4.59 s, sys: 0 ns, total: 4.59 s
    Wall time: 4.59 s

On one hand, that’s only 4.6 microseconds per pixel set.  On the other
hand, it’s too slow to do it within a single 16-millisecond 60-hertz
frame by almost three orders of magnitude.

So I set to work optimizing, and now all those operations are
fast.  Here’s the results from the current version from the
crude_performance_test below::

    instantiation 0.015 ms
    nop 0.080 ms
    nop 0.018 ms
    filling with 2048 skinny lines 137.012 ms
    1048576
    counting coverage of -1 0.126 ms
    1572352
    counting coverage of -2, vertical 0.071 ms
    1573376
    counting coverage of -3, horizontal 0.063 ms
    taking snapshot 0.011 ms
    filling with -4 0.010 ms
    4194304
    counting coverage of -4 0.024 ms
    restoring snapshot 0.013 ms
    0
    counting coverage of -4 again 0.070 ms
    1572352
    counting coverage of -2 again 0.029 ms
    filling_masked with -4 0.145 ms
    1048576
    counting coverage of -4 again 0.076 ms
    nop 0.011 ms
    nop 0.019 ms
    total entries 8205

    real    0m0.184s
    user    0m0.172s
    sys     0m0.012s

This means that drawing the 2048 lines has gone from 10 seconds to 137
milliseconds, a speedup of 70×, and the fill_masked operation has gone
from 4.6s to 145μs, a speedup of 32000×.  However, this is
unrealistically good, because my optimizations have transformed what
used to be a worst case into a best case.  I suspect that
realistically the speedup in the new worst case is more like 10×.

You’ll note that the nops in the above performance test were 11 to 80
microseconds, which means that the coverage counting results in the
crisscross pattern were barely above measurement error, as was the
fill_masked operation.  Here are the only three lines that were
outside that range:

    filling with 2048 skinny lines 137.012 ms
    counting coverage of -1 0.126 ms
    filling_masked with -4 0.145 ms

I added an even cruder test where I fill the qanvas with a more random
pattern, which makes fill_masked to take on the order of a quarter
second still.  That suggests that maybe the only way to use this in
practice in a game is to spread the fill computation over multiple
screen frames.  Here’s one run::

    total entries 8205
    erasing again 0.015 ms
    2062 random lines drawn
    drawing random lines 1406.313 ms
    fill_masked on random grid 242.622 ms

Quadtree
========

The Qanvas is implemented using a Quadtree, which provides a much less
stateful interface, relying on the caller to maintain most of the
state.

The units of the quadtree are called “quares”, for “quadtree squares”.
A quare is a 2ⁿ × 2ⁿ axis-aligned square whose coordinates start at a
multiple of 2ⁿ.  Each quare is identified by a quare id number, or
“qid”.  Negative numbers are leafnodes representing pixel colors,
while the quares identified by nonnegative numbers are four-way
branching nodes that form the roots of immutable quadtrees.

Because all quares are immutable, you can do a computation on them
(such as how they should look on screen, or how many pixels of a
certain color they contain) and safely memoize its results.  Hash
consing is used to ensure that (within a single Quadtree object)
duplicate copies of the same immutable quadtree node are not stored.
This both makes it relatively space-efficient and enables such
memoized results to be applied broadly.

Node sizes (e.g., 256×256) are not stored, enabling leafnodes to be
applied to large areas.  This could have surprising results if you
aren’t consistent about keeping track of those node sizes.

Initially you can use any negative number as a qid to represent an
arbitrary-sized area of that color.

For example, to draw a 5×4 rectangle starting at (6, 2), in color -2,
on a background that’s initially of color -1, of size 16×16, we can do
this::

    >>> t = quares.Quadtree()
    >>> qid = t.update_rect(((6, 2), (5, 4)), val=-2, size=16, qid=-1)
    >>> quares.ascii_draw(t, qid, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    //////...../////
    //////...../////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> qid
    633

Redrawing the same rect, or parts of it, in the same -2 color, doesn’t
change the resulting qid::

    >>> qid = t.update_rect(((6, 2), (5, 4)), val=-2, size=16, qid=qid)
    >>> qid
    633
    >>> qid = t.update_rect(((6, 2), (4, 3)), val=-2, size=16, qid=qid)
    >>> qid
    633

Because of hash consing, (qid, size) pairs are the same whenever the
pattern of pixels they identify are the same, and different whenever
the patterns of pixels are different.

If we make an actual change, here drawing a 7×2 rectangle of -3 at (2,
5), we get a new qid::

    >>> qid = t.update_rect(((2, 5), (7, 2)), val=-3, size=16, qid=qid)
    >>> qid
    762
    >>> quares.ascii_draw(t, qid, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    //////...../////
    //-------../////
    //-------///////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////

But the previous version is still stored::

    >>> quares.ascii_draw(t, 633, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    //////...../////
    //////...../////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////

In fact, if we erase the rectangle by redrawing it in the background
color, and then redrawing the overlap in color -2, we get back the qid
for that previous version::

    >>> qid = t.update_rect(((2, 5), (7, 2)), val=-1, size=16, qid=qid)
    >>> quares.ascii_draw(t, qid, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    //////...../////
    /////////../////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> qid = t.update_rect(((6, 5), (3, 1)), val=-2, size=16, qid=qid)
    >>> quares.ascii_draw(t, qid, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    //////...../////
    //////...../////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    >>> qid
    633

At this point there are a total of 26 items stored in the Quadtree’s
hash table, representing all of the above drawings, their intermediate
versions, and their subparts.

We can get the parts of any quare with get_quare::

    >>> t.get_quare(762)
    (778, 856, -1, -1)

These are in the order northeast, northwest, southeast, southwest,
assuming Y-coordinates increase downward.  Here we can see that the
southern half of 762 is solid -1.  Its northeast quarter, 778, looks
like this::

    >>> quares.ascii_draw(t, 778, 8)
    ////////
    ////////
    //////..
    //////..
    //////..
    //------
    //------
    ////////

There’s a class that memoizes how many pixels are of a given color in
order to compute it efficiently::

    >>> qid = 762
    >>> c = quares.CoverageComputer(t)
    >>> c.coverage(qid, 16, -1)
    225
    >>> c.coverage(qid, 16, -2)
    17
    >>> c.coverage(qid, 16, -3)
    14
    >>> 225+17+14
    256

This visits each of the 14 internal nodes in the quadtree exactly
once for each target pixel value.

There’s an update_rect_masked method which only updates pixels that
previously had a given pixel value, so in effect the rectangle is
drawn underneath existing rectangles rather than on top of them::

    >>> qid = t.update_rect_masked(((4, 4), (6, 7)), -4, 16, qid, -1)
    >>> quares.ascii_draw(t, qid, 16)
    ////////////////
    ////////////////
    //////...../////
    //////...../////
    ////,,...../////
    //-------../////
    //-------,//////
    ////,,,,,,//////
    ////,,,,,,//////
    ////,,,,,,//////
    ////,,,,,,//////
    ////////////////
    ////////////////
    ////////////////
    ////////////////
    ////////////////

However, it requires worst-case O(N² log N) time rather than the O(N
log N) required by update_rect.

"""


class Quadtree:
    def __init__(self):
        self.log = []
        self.index = {}

    def total_entries(self):
        return len(self.log)

    def get_quare(self, qid):
        if qid < 0:     # Negative quares are leafnodes: a solid color
            return qid, qid, qid, qid

        return self.log[qid]

    def put_quare(self, quare):
        # Check to see if we can coalesce a leafnode.
        nw, ne, sw, se = quare
        if nw < 0 and nw == ne == sw == se:
            return nw

        qid = self.index.get(quare)
        if qid is None:
            qid = len(self.log)
            self.index[quare] = qid
            self.log.append(quare)

        return qid

    def update_rect(self, rect, val, size, qid, table=None):
        """Return a new qid for the quare qid modified by setting rect to val.
        
        rect is ((start_x, start_y), (width, height)).

        Val should normally be negative to indicate a leafnode.

        qid is interpreted as a size×size square starting at (0, 0)
        and may be negative to indicate a leafnode or may be a qid
        returned from update_rect or put_quare.  size should be a
        power of 2.

        """
        # Three cases: inside rect, outside rect, overlapping rect.
        # The first two cases are easily handled; the third requires
        # recursing on children.
        (start_x, start_y), (width, height) = rect

        # Is the whole size×size square outside the rect?
        if (start_x <= -width or start_y <= -height
                or start_x >= size or start_y >= size):
            return qid

        # Or is the whole size×size square *inside* this rect?
        if (start_x <= 0 and start_y <= 0
                and start_x + width >= size and start_y + height >= size):
            return val
        
        # If not, we must recurse on the children.
        # Attempt to memoize.  First, though, clip the rect to the
        # sizexsize+0+0 bounds of this quare:
        if start_x < 0:
            width += start_x
            start_x = 0

        if start_y < 0:
            height += start_y
            start_y = 0

        width = min(size, width)
        height = min(size, height)

        if table is None:
            table = {}

        memo_key = qid, size, start_x, start_y, width, height
        memo_value = table.get(memo_key)
        if memo_value is not None:
            return memo_value

        nw_kid, ne_kid, sw_kid, se_kid = self.get_quare(qid)

        half_size = size // 2
        new_nw_kid = self.update_rect(((start_x, start_y), (width, height)),
                                      val, half_size, nw_kid, table)
        new_ne_kid = self.update_rect(((start_x - half_size, start_y),
                                       (width, height)),
                                      val, half_size, ne_kid, table)
        new_sw_kid = self.update_rect(((start_x, start_y - half_size),
                                       (width, height)),
                                      val, half_size, sw_kid, table)
        new_se_kid = self.update_rect(((start_x - half_size,
                                        start_y - half_size),
                                       (width, height)),
                                      val, half_size, se_kid, table)
        result = self.put_quare((new_nw_kid, new_ne_kid, new_sw_kid, new_se_kid))
        table[memo_key] = result
        return result

    def update_rect_masked(self, rect, val, size, qid, target, table=None):
        """Like update_rect, but only changes pixels of value target."""
        (start_x, start_y), (width, height) = rect

        if (start_x <= -width or start_y <= -height
                or start_x >= size or start_y >= size):
            return qid

        # In this case, if the quare is entirely inside the rect, we
        # only avoid recursing in the case where it’s a leafnode.
        if (qid < 0 and start_x <= 0 and start_y <= 0
                and start_x + width >= size and start_y + height >= size):
            return val if qid == target else qid

        # Attempt to memoize.  First, though, clip the rect to the
        # sizexsize+0+0 bounds of this quare:
        if start_x < 0:
            width += start_x
            start_x = 0

        if start_y < 0:
            height += start_y
            start_y = 0

        width = min(size, width)
        height = min(size, height)

        if table is None:
            table = {}

        memo_key = qid, size, start_x, start_y, width, height
        memo_value = table.get(memo_key)
        if memo_value is not None:
            return memo_value

        nw_kid, ne_kid, sw_kid, se_kid = self.get_quare(qid)
        half_size = size // 2
        new_nw_kid = self.update_rect_masked(((start_x, start_y), (width, height)),
                                             val, half_size, nw_kid, target, table)
        new_ne_kid = self.update_rect_masked(((start_x - half_size, start_y),
                                              (width, height)),
                                             val, half_size, ne_kid, target, table)
        new_sw_kid = self.update_rect_masked(((start_x, start_y - half_size),
                                              (width, height)),
                                             val, half_size, sw_kid, target, table)
        new_se_kid = self.update_rect_masked(((start_x - half_size,
                                               start_y - half_size),
                                              (width, height)),
                                             val, half_size, se_kid, target, table)
        result = self.put_quare((new_nw_kid, new_ne_kid, new_sw_kid, new_se_kid))
        table[memo_key] = result
        return result


def ascii_draw(quadtree, qid, size):
    buf = [[0] * size for i in range(size)]
    _update_buf(buf, quadtree, qid, size, (0, 0))
    for row in buf:
        print(''.join(chr(32 + (item & 0xf)) for item in row))

def _update_buf(buf, quadtree, qid, size, origin):
    x, y = origin
    if size == 1:
        buf[y][x] = qid
    else:
        nw_kid, ne_kid, sw_kid, se_kid = quadtree.get_quare(qid)

        half_size = size // 2
        _update_buf(buf, quadtree, nw_kid, half_size, (x, y))
        _update_buf(buf, quadtree, ne_kid, half_size, (x + half_size, y))
        _update_buf(buf, quadtree, sw_kid, half_size, (x, y + half_size))
        _update_buf(buf, quadtree, se_kid, half_size,
                    (x + half_size, y + half_size))


class CoverageComputer:
    def __init__(self, tree):
        self.tree = tree
        self.cache = {}

    def coverage(self, qid, size, target):
        cached = self.cache.get((qid, size, target))
        if cached is not None:
            return cached
        result = self._compute_coverage(qid, size, target)
        self.cache[qid, size, target] = result
        return result

    def _compute_coverage(self, qid, size, target):
        if qid < 0:
            return size * size if qid == target else 0
        half_size = size // 2
        #print("getting", qid)
        return sum(self.coverage(kid, half_size, target)
                   for kid in self.tree.get_quare(qid))


class Qanvas:
    """More convenient imperative interface."""
    def __init__(self, size, background=-1):
        if size & (size - 1):
            raise ValueError("size must be a power of 2")
        self.size = size
        self.tree = Quadtree()
        self._coverage = CoverageComputer(self.tree)
        self.qid = background

    def snapshot(self):
        return self.qid

    def restore(self, snapshot):
        self.qid = snapshot

    def print(self):
        """Draw the pixel data represented in ASCII art."""
        ascii_draw(self.tree, self.qid, self.size)

    def fill(self, x, y, width, height, color):
        """Fill a rectangle with color color at widthxheight+x+y.

        That is, horizontally from x to x+width, exclusive, and from y
        to y+height, exclusive.

        """
        if color >= 0:
            raise ValueError("colors must be negative")

        self.qid = self.tree.update_rect(((x, y), (width, height)), val=color,
                                         size=self.size, qid=self.qid)

    def fill_masked(self, x, y, width, height, color, target):
        """In the rectangle at widthxheight+x+y, replace 'target' with 'color'.

        If target is the background color (default -1), the result
        will be that the filled rectangle is drawn, in effect,
        “underneath” any existing rectangles in that area.

        """
        if color >= 0 or target >= 0:
            raise ValueError("colors must be negative")

        self.qid = self.tree.update_rect_masked(((x, y), (width, height)),
                                                val=color,
                                                size=self.size,
                                                qid=self.qid,
                                                target=target)

    def coverage(self, target):
        """Count how many pixels are colored the target color."""
        if target >= 0:
            raise ValueError("colors must be negative")
        return self._coverage.coverage(self.qid, self.size, target)


def crude_performance_test():
    from time import time
    class Timer:
        def __init__(self):
            self.t = time()

        def lap(self):
            then = self.t
            self.t = now = time()
            return now - then

        def print(self, what):
            print(what, "%.3f" % (self.lap()*1000), "ms")

    from random import randrange

    t = Timer()
    q = Qanvas(2048)
    t.print("instantiation")
    t.print("nop")
    t.print("nop")

    for i in range(0, 2048, 2):
        q.fill(i, 0, 1, 2048, -2)
        q.fill(0, i, 2048, 1, -3)
    t.print("filling with 2048 skinny lines")

    print(q.coverage(-1))
    t.print("counting coverage of -1")

    print(q.coverage(-2))
    t.print("counting coverage of -2, vertical")

    print(q.coverage(-3))
    t.print("counting coverage of -3, horizontal")

    s = q.snapshot()
    t.print("taking snapshot")

    q.fill(0, 0, 2048, 2048, -4)
    t.print("filling with -4")

    print(q.coverage(-4))
    t.print("counting coverage of -4")

    q.restore(s)
    t.print("restoring snapshot")

    print(q.coverage(-4))
    t.print("counting coverage of -4 again")

    print(q.coverage(-2))
    t.print("counting coverage of -2 again")

    q.fill_masked(0, 0, 2048, 2048, -4, target=-1)
    t.print("fill_masked with -4")

    print(q.coverage(-4))
    t.print("counting coverage of -4 again")
    t.print("nop")
    t.print("nop")

    print("total entries", q.tree.total_entries())

    q.fill(0, 0, 2048, 2048, -1)
    t.print("erasing again")
    drawn_lines = 0
    for i in range(2048):
        if randrange(2):
            q.fill(i, 0, 1, 2048, -2)
            drawn_lines += 1
        if randrange(2):
            q.fill(0, i, 2048, 1, -3)
            drawn_lines += 1

    print(drawn_lines, "random lines drawn")
    t.print("drawing random lines")

    q.fill_masked(0, 0, 2048, 2048, -4, target=-1)
    t.print("fill_masked on random grid")
    
if __name__ == '__main__':
    crude_performance_test()