#!/usr/bin/python3
"""Stochastic insertion sort: a novel O(N√N) sort algorithm.

I suspect this may be a new point on the frontier of Pareto-optimal
sorting algorithms!

This is a constant-space comparison sorting algorithm that’s
about 10 lines of code with (empirically) O(N√N) average performance
(O(N²) worst case).  Under simplifying assumptions, it’s never more
than 50% slower than insertion sort on random input and never more
than twice as slow as insertion sort on worst-case input.  Like
insertion sort, it’s linear-time on nearly-sorted input.  It’s better
than heapsort or smoothsort by being simpler; it’s better
than quicksort by being simpler and constant-space, and
performing better on nearly-sorted input; it’s better than mergesort
by being simpler and constant-space, and by performing better on
nearly-sorted input; it’s better than library sort by being simpler
and constant-space; it’s better than insertion sort by being faster on
average for N more than about 100; it’s better than Grailsort (I
think) by being *much* simpler and faster for N up to a fairly large
number.

(I should check Shellsort and comb sort, which are close competitors.)

I’m not sure if it’s of any practical use.  Maybe in some kind of very
memory-constrained system, where you don’t have space for the code for
heapsort?  I’m also not sure if the simplifying assumptions hold in
practice; above I'm assuming that generating a random number in a
constantly changing range is free, but it may be costly enough that it
doesn’t surpass insertion sort until much larger inputs.  At first I
thought it was a stable sort, but now I realize that it isn’t.

Consider WOLOG insertion-sorting an array into ascending order
starting at the beginning.  Insertion sort divides the array into an
(initially empty) sorted segment at the beginning and an unsorted
segment at the end, and on each pass it sinks a new unsorted element
into its proper place in the sorted segment by repeated swaps.  The
tweak is very simple: before swapping two elements within the growing
sorted segment, compare the smaller (sinking) element with a randomly
chosen candidate from the unsorted segment, and swap the sinking
element with that candidate if the candidate is smaller.

I thought it would be interesting to see if adding random swaps to a
bubble sort improved it.  It didn’t; it made it require 16 times as
many comparisons at size 1 million.  So I tried adding them to
insertion sort instead.  Empirically that seems to have made it O(N√N)
instead of O(N²) on random data.  For 10000 random elements it does
2.45 million comparisons, each possibly followed by a swap, an order
of magnitude better than insertion sort.  Empirically it asymptotes to
about 2.45 N √N comparisons, while plain insertion sort is 0.25 N².

One way to look at this is that insertion sort is slow because it
makes bad choices of which element to add to the sorted segment,
choices that are so bad that often a random choice from the unsorted
elements would be much better.  So, as we’re doing the insertion,
sinking a low element toward its ultimate position, we check random
unsorted elements to see if we can find an even lower one to use
instead.  At first, this results in having to do more swaps on average
on each pass, but it depletes the unsorted part of the array of very
low elements and enriches the sorted part of the array in very low
elements.

Another way to look at this is that insertion sort is slow because it
only moves elements one position at a time, so this algorithm gives
elements a chance to leap a long way, a chance which will probably
take when they are far from their correct position.  This results in
less work for the insertion-sorting part of the algorithm to do.

"""

import random


def stochobadsort(xs):
    "This is worse than naïve bubble sort."
    sorted = False
    comparisons = 0
    while not sorted:
        sorted = True
        for i in range(1, len(xs)):
            comparisons += 1
            if xs[i-1] > xs[i]:
                sorted = False
            j = random.randrange(len(xs))

            comparisons += 1
            if xs[min(i, j)] > xs[max(i, j)]:
                sorted = False
                xs[i], xs[j] = xs[j], xs[i]

    return comparisons


def inssort(xs):
    "This is naïve insertion sort for comparison."
    comparisons = 0
    for i in range(1, len(xs)):
        j = i
        while j > 0:
            comparisons += 1
            if xs[j-1] < xs[j]:
                break
            xs[j-1], xs[j] = xs[j], xs[j-1]
            j -= 1

    return comparisons
            

def stochosort(xs):
    "This is the algorithm that’s better than naïve insertion sort."
    comparisons = 0
    for i in range(1, len(xs)):
        j = i

        while j > 0:
            comparisons += 1
            if xs[j-1] < xs[j]:
                break

            # We’re sinking an element toward the bottom of the sorted
            # substring at the beginning of the array.  Let’s
            # opportunistically see if we can find an even smaller one
            # to use instead, which will still be smaller than the one
            # to its left.
            k = random.randrange(i, len(xs)) # could be j instead of i
            comparisons += 1
            if xs[k] < xs[j]:
                xs[j], xs[k] = xs[k], xs[j]
            xs[j-1], xs[j] = xs[j], xs[j-1]
            j -= 1

    return comparisons

def test_stochosort(n, stochosort=stochosort):
    xs = list(range(n))
    random.shuffle(xs)
    comparisons = stochosort(xs)
    assert all(xs[i-1] <= xs[i] for i in range(1, len(xs))), xs
    assert set(xs) == set(range(n)), xs
    assert len(xs) == n, xs
    return comparisons

def avg_comparisons(n, algorithm, reps=10):
    return sum(test_stochosort(n, algorithm) for i in range(reps))/reps

if __name__ == '__main__':
    print('n\tstochosort_comparisons\tinssort_comparisons')
    for n in [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000]:
        print('\t'.join([str(n),
                         str(avg_comparisons(n, stochosort)),
                         str(avg_comparisons(n, inssort)),
        ]))