I actually still have this convergence slowdown problem, but to a lesser extent. Within a few minutes, it will reach over 1000 polygons, but if I leave it overnight, it’s at less than 2500 polygons (and looks awesome). While it’s desirable from an optimization perspective that it doesn't use too many polygons — really this is a thing I should put into the loss function — and from an artistic perspective that it doesn’t get too close to the original image, from a perspective of using hill climbing to solve optimization problems in general, this is kind of a serious problem.
Probably using a distribution of polygon sizes that is closer to exponential (or maybe power-law) would work a lot better: if it's 10% polygons about half the size of the canvas, 10% polygons ¼ the size, 10% polygons ⅛ the size, 10% polygons 1/16 the size, and so on up to 1/2048 the size (133 pixels), then it would proceed even more rapidly at first, but continue to improve even once the needed changes were very small. I should try out this approach, and I suspect it can be generalized.
I guess simulated annealing is another approach to solving that problem with hill-climbing.
If I compute the loss function over a downsampled version of the image, at least until the downsampled version is improving very slowly, that should dramatically speed up performance and also make it possible to use only black polygons.