#!/usr/bin/python
# -*- coding: utf-8 -*-
"""I’m facing the following situation in a sort of data prevalence system.
I have the entire history of deltas, but I may be faced with arbitrary
sequences of undos followed by adding new deltas. Recomputing the
state after an undo requires finding the latest snapshotted or
checkpointed state before that undo, then replaying the deltas forward
from that snapshot (possibly creating more snapshots in the process).
Ideally, I’d like a guarantee that no sequence of undos and adding
deltas ever departs from a regime of amortized constant time per (undo
or delta-adding) operation, in constant space. This is probably not
quite possible; I think I can get to amortized logarithmic time in
logarithmic space.
I think I may have an algorithm that guarantees amortized logarithmic
time in logarithmic space, but I don’t have proof.
The algorithm is: maintain a set of O(log t) snapshots when you’re at
delta-history-length t. When adding another snapshot, first delete
the lowest-value snapshot, where ‘value’ is the ratio of how much work
that snapshot will save you if you have to use it (b) to the number of
undos (d) you’ll have to go through before you get to it.
The motivating intuition here is that if you start at t=256 and undo
all the way back down to 0, you’d be in good shape if you had
snapshots at 128, 192, 224, 240, 248, 252, 254, and 255. The first
undo puts you at 255 and requires zero replays; the second at 254 and
requires zero replays; the third at 253 requiring one replay; the
fourth at 252 requiring 0; the fifth at 251 requiring three replays,
creating new snapshots at 249 and 250; and so on. On the 65th undo
you reach 191, needing to replay 63 deltas starting from 128. I think
this works out to a logarithmic (rather than constant) number of
replays per undo, and in any case you’re managing a logarithmic number
of snapshots.
The key thing here is that after doing the Nth undo, you never have to
do more than N replays. Or something like that.
The greedy algorithm described above seems to do a reasonable job of
approximating such an exponentially-spaced set of checkpoints, in the
simple test given here.
It’s possible to implement this incremental greedy algorithm
considerably more efficiently than it’s done here; removing a snapshot
only changes the values of the adjacent snapshots, so it’s not
necessary to recompute all the values every time.
"""
import Numeric
def tocull(snapshots, present):
s = Numeric.asarray(snapshots)
b = s - Numeric.concatenate(([0], s[:-1]))
d = (present - Numeric.concatenate((s[1:], [present]))) + 1
v = b.astype(Numeric.Float64) / d
return min(range(len(v)), key=lambda i: v[i])
def tocull_potato(snapshots, present):
"A reimplementation in potato programming."
s = [0] + snapshots + [present]
worst_v = infinity = 1e309
assert infinity == infinity * 2
for ii in range(1, len(s)-1):
v = float(s[ii] - s[ii-1]) / (present - s[ii+1] + 1)
if v < worst_v:
worst_v = v
worst_index = ii-1
return worst_index
def cull(snapshots, present):
assert tocull_potato(snapshots, present) == tocull(snapshots, present)
snapshots.pop(tocull(snapshots, present))
def cullto(snapshots, present, max):
while len(snapshots) > max:
cull(snapshots, present)
ns = [90, 50, 40, 20, 10, 6, 4, 3, 2, 1]
snapshots = range(100)
for n in ns:
cullto(snapshots, 100, n)
print snapshots
# incremental approach:
def incremental(n, max):
snapshots = []
for present in range(max):
if snapshots:
cullto(snapshots, present, n-1)
snapshots.append(present)
return snapshots
for n in ns:
print incremental(n, 100)
print incremental(7, 400)
for n in [8, 9, 10, 11, 12, 13]:
for max in [900, 1000, 1100, 1200, 1300]:
print incremental(n, max)