So I’ve been thinking about how to render low-computational-cost textures in 2D. I have a few different objectives in mind:
Ultra-low-power e-paper computing systems; the objective here is for the energy cost of generating the texture data to be comparable to the energy cost of updating the e-paper to display it. As outlined in Keyboard-powered computers, updating an e-paper display costs on the order of 200 nJ per pixel update, while common 32-bit low-power processors use on the order of 300 pJ per instruction. This means we can afford something like 700 instructions per pixel, if we’re content to have the battery life be half of what it could be. But e.g. reducing below 175 instructions per pixel only increases your battery life by 25%.
Live video generation in software from low-memory computers; the objective here is to be able to display more interesting things on the screen than you have memory space for a full framebuffer. As mentioned in Notes on the STM32 microcontroller family, NTSC composite video has space for about 200 kilopixels in living color, but common 32-bit microcontrollers like the STM32 line have only 4–256 KiB of RAM, despite running at up to 200 MIPS. To the extent that the display contents can be encoded in RAM with many pixels per byte, then rendered in software to a scanline “framebuffer” or a “framebuffer” containing a few scanlines, they can be far richer within these limitations. 200 kilopixels at 30Hz is 6 megapixels per second. At 96 MIPS, that's a budget of 16 instructions per pixel, or 32 instructions per pixel if you’re satisfied with a 200×500 display.
Providing more powerful graphical primitives for GUI systems. Old
GUIs, up to around 1998, were so constrained by the slow computers
they ran on that they had to update the framebuffer incrementally,
because there wasn’t enough time to redraw the entire screen in
between screen updates. Modern GUIs’ programming model and
appearance mostly mindlessly apes the GUIs of that epoch, although
the IMGUI paradigm, gradients, filters, and transparency are making
their way in; Self’s cel-animation-inspired effects have made their
way into modern GUIs as animated transitions, notably in window
managers and jQuery animations; CSS is popularizing arbitrary
transform matrices; and Google’s Material Design is leading a move
to physically-based rendering in user interfaces. But there's a
much wider range of possibilities available. Even the CPU of the
laptop I’m typing this on averages over 6 64-bit or 128-bit
instructions per clock cycle at 1.6 GHz, or 10,000 MIPS, and its LCD
screen is only 1920×1080 at 60Hz, or 124 megapixels per second.
This is a computational budget of about 80 CPU instructions per
pixel. Moreover, its integrated GPU (see file asus-gpu
) is
capable of doing maybe 50 billion single-precision
multiply-accumulates per second (invoked from 12.5 billion
instructions), or twice that many in half-precision, and supports
OpenCL; if this is correct, this would be a computational budget of
about 400 multiply-accumulates per pixel (invoked from 100 4×SIMD
instructions).
Reducing GUI latency and tearing. A 60Hz screen redraw is already 16.7 ms, which Dan Luu has convincingly shown is already significant in user experience. Double-buffering adds an additional 16.7 ms of latency, and doing it out of sync with the vertical refresh adds an additional random latency that ranges from 0 to 16.7 ms. Typical keystroke-to-screen latencies on modern computers are in the range of 100 ms, and decreasing that by 25% would be a significant improvement. The hardware constraints here are the same as in the previous item: 80 CPU instructions per pixel or 400 GPU operations per pixel.
So, all of these different ways that you can use low-computational- cost texture generation have different computational budgets (175 instructions per pixel for e-paper, 16 for bitbanging NTSC, 80 instructions and 400 multiply-accumulates for the laptop scenarios), but they’re all kind of in the same ballpark. Moreover, they’re in a completely different world from the 6 MHz 1 Dhrystone MIPS 68000 and its 9" 512×342 CRT, which I am assuming refreshed at around 60 Hz, giving it a dot clock a bit over 10 MHz — 10 pixels per instruction. They are, respectively, 1750 times faster, 160 times faster, and 800 times faster (not counting the GPU, which is arguably something like 4000 times faster), relative to the notional “dot clock”.
http://www.iquilezles.org/www/articles/simplewater/simplewater.htm
http://demo-effects.sourceforge.net/
https://github.com/ocornut/imgui
http://www.lofibucket.com/articles/64k_intro.html
http://www.iquilezles.org/www/articles/morenoise/morenoise.htm
http://jcgt.org/published/0004/02/01/
https://www.shadertoy.com/view/4ttSWf
http://www.iquilezles.org/www/articles/rmshadows/rmshadows.htm
http://erleuchtet.org/~cupe/permanent/enhanced_sphere_tracing.pdf
https://www.slideshare.net/DICEStudio/five-rendering-ideas-from-battlefield-3-need-for-speed-the-run
http://web.archive.org/web/20180306233623/https://prideout.net/blog/?p=63
http://www.microscopics.co.uk/blog/2010/paulstretch-an-interview-with-paul-nasca/
https://web.archive.org/web/20160325174539/http://freespace.virgin.net/hugo.elias/graphics/x_water.htm
https://github.com/intel/opencl-intercept-layer/blob/master/docs/kernel_isa.md
https://nlguillemot.wordpress.com/2017/01/30/intel-gpu-assembly-with-pix-beta/
https://www.x.org/docs/intel/CHV/intel-gfx-prm-osrc-chv-bsw-vol03-gpu-overview.pdf
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-3d-media-gpgpu-engine_0_1.pdf
http://renderingpipeline.com/graphics-literature/low-level-gpu-documentation/
https://doc.lagout.org/electronics/Intel-Graphics-Architecture-ISA-and-microarchitecture.pdf
https://01.org/linuxgraphics/downloads/stack
https://01.org/linuxgraphics/downloads/2018q1-intel-graphics-stack-recipe
https://linuxhint.com/gpu-programming/
https://software.intel.com/en-us/articles/introduction-to-gen-assembly
https://en.wikipedia.org/wiki/Simplex_noise
http://staffwww.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf