#!/usr/bin/python3 """Stochastic insertion sort: a novel O(N√N) sort algorithm. I suspect this may be a new point on the frontier of Pareto-optimal sorting algorithms! This is a constant-space comparison sorting algorithm that’s about 10 lines of code with (empirically) O(N√N) average performance (O(N²) worst case). Under simplifying assumptions, it’s never more than 50% slower than insertion sort on random input and never more than twice as slow as insertion sort on worst-case input. Like insertion sort, it’s linear-time on nearly-sorted input. It’s better than heapsort or smoothsort by being simpler; it’s better than quicksort by being simpler and constant-space, and performing better on nearly-sorted input; it’s better than mergesort by being simpler and constant-space, and by performing better on nearly-sorted input; it’s better than library sort by being simpler and constant-space; it’s better than insertion sort by being faster on average for N more than about 100; it’s better than Grailsort (I think) by being *much* simpler and faster for N up to a fairly large number. (I should check Shellsort and comb sort, which are close competitors.) I’m not sure if it’s of any practical use. Maybe in some kind of very memory-constrained system, where you don’t have space for the code for heapsort? I’m also not sure if the simplifying assumptions hold in practice; above I'm assuming that generating a random number in a constantly changing range is free, but it may be costly enough that it doesn’t surpass insertion sort until much larger inputs. At first I thought it was a stable sort, but now I realize that it isn’t. Consider WOLOG insertion-sorting an array into ascending order starting at the beginning. Insertion sort divides the array into an (initially empty) sorted segment at the beginning and an unsorted segment at the end, and on each pass it sinks a new unsorted element into its proper place in the sorted segment by repeated swaps. The tweak is very simple: before swapping two elements within the growing sorted segment, compare the smaller (sinking) element with a randomly chosen candidate from the unsorted segment, and swap the sinking element with that candidate if the candidate is smaller. I thought it would be interesting to see if adding random swaps to a bubble sort improved it. It didn’t; it made it require 16 times as many comparisons at size 1 million. So I tried adding them to insertion sort instead. Empirically that seems to have made it O(N√N) instead of O(N²) on random data. For 10000 random elements it does 2.45 million comparisons, each possibly followed by a swap, an order of magnitude better than insertion sort. Empirically it asymptotes to about 2.45 N √N comparisons, while plain insertion sort is 0.25 N². One way to look at this is that insertion sort is slow because it makes bad choices of which element to add to the sorted segment, choices that are so bad that often a random choice from the unsorted elements would be much better. So, as we’re doing the insertion, sinking a low element toward its ultimate position, we check random unsorted elements to see if we can find an even lower one to use instead. At first, this results in having to do more swaps on average on each pass, but it depletes the unsorted part of the array of very low elements and enriches the sorted part of the array in very low elements. Another way to look at this is that insertion sort is slow because it only moves elements one position at a time, so this algorithm gives elements a chance to leap a long way, a chance which will probably take when they are far from their correct position. This results in less work for the insertion-sorting part of the algorithm to do. """ import random def stochobadsort(xs): "This is worse than naïve bubble sort." sorted = False comparisons = 0 while not sorted: sorted = True for i in range(1, len(xs)): comparisons += 1 if xs[i-1] > xs[i]: sorted = False j = random.randrange(len(xs)) comparisons += 1 if xs[min(i, j)] > xs[max(i, j)]: sorted = False xs[i], xs[j] = xs[j], xs[i] return comparisons def inssort(xs): "This is naïve insertion sort for comparison." comparisons = 0 for i in range(1, len(xs)): j = i while j > 0: comparisons += 1 if xs[j-1] < xs[j]: break xs[j-1], xs[j] = xs[j], xs[j-1] j -= 1 return comparisons def stochosort(xs): "This is the algorithm that’s better than naïve insertion sort." comparisons = 0 for i in range(1, len(xs)): j = i while j > 0: comparisons += 1 if xs[j-1] < xs[j]: break # We’re sinking an element toward the bottom of the sorted # substring at the beginning of the array. Let’s # opportunistically see if we can find an even smaller one # to use instead, which will still be smaller than the one # to its left. k = random.randrange(i, len(xs)) # could be j instead of i comparisons += 1 if xs[k] < xs[j]: xs[j], xs[k] = xs[k], xs[j] xs[j-1], xs[j] = xs[j], xs[j-1] j -= 1 return comparisons def test_stochosort(n, stochosort=stochosort): xs = list(range(n)) random.shuffle(xs) comparisons = stochosort(xs) assert all(xs[i-1] <= xs[i] for i in range(1, len(xs))), xs assert set(xs) == set(range(n)), xs assert len(xs) == n, xs return comparisons def avg_comparisons(n, algorithm, reps=10): return sum(test_stochosort(n, algorithm) for i in range(reps))/reps if __name__ == '__main__': print('n\tstochosort_comparisons\tinssort_comparisons') for n in [1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000]: print('\t'.join([str(n), str(avg_comparisons(n, stochosort)), str(avg_comparisons(n, inssort)), ]))