klapquad.com) is a 31-byte MS-DOS display hack (a
category known as the "32b intro") written by an entity known as T$.
klapquad.zip is also a 64-byte version
that makes even nicer graphics. Here's more or less how it looks in
(video formats: 250kilobyte Ogg Theora or 4 megabyte ZMBV avi)
T$ was kind enough to include the assembly source to the 31-byte version:
;Klappquadrat 32b ;32 byte intro source by T$ ;Greets to mados, cthulhu, spacey and neo org 100h mov al,13h int 10h lds ax,[bx] schleife: mov ax,di xor dx,dx mov bx,320 div bx ;dx=x, ax=y add ax,cx add dx,cx and ax,dx shr ax,cl xor [di],al inc di jnz schleife inc cx jmp short schleife
The 64-byte version looks fairly similar:
kragen@thrifty:~/pkgs/klapquad$ objdump -D -b binary -m i8086 -M intel quad_64b.com quad_64b.com: file format binary Disassembly of section .data: 0000000000000000 <.data>: 0: b0 13 mov al,0x13 2: cd 10 int 0x10 4: c5 07 lds ax,DWORD PTR [bx] 6: ba c9 03 mov dx,0x3c9 9: b5 03 mov ch,0x3 b: 66 c1 c8 08 ror eax,0x8 f: 3c 3f cmp al,0x3f 11: 72 04 jb 0x17 13: b0 3f mov al,0x3f 15: fe c4 inc ah 17: f6 c1 03 test cl,0x3 1a: 74 01 je 0x1d 1c: ee out [dx],al 1d: e2 ec loop 0xb 1f: 89 f8 mov ax,di 21: 31 d2 xor dx,dx 23: bb 40 01 mov bx,0x140 26: f7 f3 div bx 28: 01 c8 add ax,cx 2a: 01 ca add dx,cx 2c: 21 d0 and ax,dx 2e: c1 e8 03 shr ax,0x3 31: d3 e8 shr ax,cl 33: 00 05 add BYTE PTR [di],al 35: 47 inc di 36: 75 e7 jne 0x1f 38: 41 inc cx 39: e4 60 in al,0x60 3b: fe c8 dec al 3d: 75 e0 jne 0x1f 3f: c3 ret
The rest of this file is about 3000 words of my commentary on these 31 instructions. I might be wrong about some things, because I'm pretty ignorant about assembly language and MS-DOS, and this code is a bit clever. I'm mostly doing this to learn some of the tricks in the code.
0: b0 13 mov al,0x13 2: cd 10 int 0x10 4: c5 07 lds ax,DWORD PTR [bx]
This part is in the original source; it's a pretty standard way to
start out a small graphics intro, like something in the 32b or 64b
categories. fr-016 starts the same way, but with a
instead of a
ax starts out as 0, and in
selects the service. Service 0 is setting the video mode; you specify
the video mode in
al. So the first two instructions set the video
mode to mode 13h, which is the very handy 320x200x256 "MCGA" mode
supported by almost all SuperVGA cards.
See my notes on fr-016 for how the
lds thing works,
les instead of
lds. Setting the
register like this makes the program code more or less inaccessible
for data operations.
6: ba c9 03 mov dx,0x3c9 9: b5 03 mov ch,0x3
Note that there's a
loop instruction below that jumps to
the above two instructions are setup for a loop. Here's the body of
b: 66 c1 c8 08 ror eax,0x8 f: 3c 3f cmp al,0x3f 11: 72 04 jb 0x17 13: b0 3f mov al,0x3f 15: fe c4 inc ah 17: f6 c1 03 test cl,0x3 1a: 74 01 je 0x1d 1c: ee out [dx],al 1d: e2 ec loop 0xb
Loops generally need to have some kind of side effect in them to be
useful, and in this case, it looks like the purpose of the loop is the
out instruction, which writes the byte
al to the port in
dx got set in the loop setup code above to 3c9h, and isn't modified
inside the loop, so it's always writing bytes to this same port.
It turns out that this is the port you write bytes to in order to set up the VGA palette, so this loop is there to set up the palette. This 25-byte loop (including the two-instruction setup) is also the major difference between the 64-byte version and the 31-byte version whose source is above.
The normal sequence, according to http://www.brackeen.com/vga/bitmaps.html#5, is that you write the palette index to port 3c8h, then the six-bit red, green, and blue values in sequence to port 3c9h. It says you can load the whole palette by first writing a zero to 3c8h, and then writing all 256 palette entries in sequence to 3c9h. So I hypothesize that in general you can write any number of palette entries in sequence this way, and the (emulated) card happens to default to setting palette entry zero at bootup.
Interestingly, this part of the program seems to work differently in FreeDOS under QEMU than in DOSBox; the palette I get in FreeDOS is the black, red, orange, yellow, white palette you can see in the screenshot above, while the one I see in QEMU ranges from green through yellow to white, with no black. I get a somewhat similar effect, actually, if I run this program twice inside DOSBox, but it's cyan instead of green.
This suggests that the problem is the initial state of the
register, and indeed if I insert an
xor eax, eax at the beginning
of the program, it displays more correctly in FreeDOS in QEMU, and can
run more than once without screwing up the colors in DOSBox.
So, anyway, how does this loop produce this sequence of colors?
ax starts out as whatever the BIOS video mode routine
leaves in it. Each time through the loop, we rotate it by 8 bits, so
every four times through the loop, it will be rotated back to its
original position. The two bottom bits of the loop counter are tested
test instruction, and one time out of four, we skip the
instruction, so one of the four bytes in
eax is invisible.
This loop seems to set all 256 palette entries. At first that was
what I expected it to do, since that would be 3 * 256 color
components, and we set
cx to 3 * 256 by setting
ch to 3. But then
I thought that actually
cx gets decremented by the
instruction even when we didn't output a color component, so I thought
it would only set the first 192 colors. But if I increase
ch to 4,
then the first many colors all get set to white, I guess because it
So, anyway, what were those colors? Here's the code that computes them.
b: 66 c1 c8 08 ror eax,0x8 f: 3c 3f cmp al,0x3f 11: 72 04 jb 0x17 13: b0 3f mov al,0x3f 15: fe c4 inc ah
So first it rotates
eax by a byte; then it checks to see if the low
byte is greater than or equal to 0x3f, which is 63, the largest
six-bit value. If so, it thresholds it to 0x3f and increments
the next byte up. So when red reaches its max, green starts to
increment, when green reaches its max, blue starts to increment; when
blue reaches its max, the invisible byte starts to increment; and when
the invisible byte reaches its max, red starts to increment. So, if
we start with some large number in the invisible byte, some small
number in the red byte, and zeroes elsewhere, this will give us the
dark red, light red, orange, yellow, white sequence that we see.
I seem to remember that this is what's called the "Dutch palette", because so many demos from the Netherlands used it for a while.
I was surprised to learn that you could use
eax in 16-bit real mode
like this. Apparently the operand size prefix 0x66 works in 16-bit
mode to give you 32-bit operands just as it works in 32-bit mode to
give you 16-bit operands.
So the remaining mysteries to me at this point were:
Why do all 256 palette entries get set, and not just 192 of them?
Where does the large number in the invisible byte (I guessed it's
ah on entry to the loop, but I was wrong) come from? It must be
something the BIOS
int 10h call is sticking there, I incorrectly
thought, but what is it and what does it mean?
To answer these questions, I resorted to probing QEMU with GDB.
First, in QEMU's console, while running
(qemu) info registers EAX=3f3f0000 EBX=... ... ES =22e4 ... CS =22e4 ...
So I can set a breakpoint at 0x22f40 and get it to stop on program
entry. GDB is kind of dumb when controlling QEMU in real mode; it can
set breakpoints, and it does stop when its breakpoints get hit, but it
can't figure out that it stopped because the breakpoint was hit, so
you have to manually remove the breakpoint if you want to continue,
stepi doesn't work. Anyway, so I restarted QEMU and, before
quad_64b, I said:
(qemu) gdbserver (qemu)
And then I ran GDB:
kragen@thrifty:~/devel/circles$ gdb GNU gdb 6.4.90-debian ... (gdb) target remote localhost:1234 Remote debugging using localhost:1234 0x0000e830 in ?? () (gdb) set architecture i8086 The target architecture is assumed to be i8086 (gdb) b *0x22f40 Breakpoint 1 at 0x22f40 (gdb) c Continuing.
And then I started the program:
And GDB woke up:
Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000100 in ?? () (gdb) x/40i $cs*16+$eip 0x22f40: mov $0x13,%al 0x22f42: int $0x10 0x22f44: lds (%bx),%ax 0x22f46: mov $0x3c9,%dx 0x22f49: mov $0x3,%ch 0x22f4b: ror $0x8,%eax ...
So I set breakpoints after the return from the interrupt, and at the top of the loop body, and deleted the breakpoint where I was:
(gdb) b *0x22f44 Breakpoint 2 at 0x22f44 (gdb) b *0x22f4b Breakpoint 3 at 0x22f4b (gdb) delete 1 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000104 in ?? () (gdb) info registers eax 0x400020 4194336 ecx 0xff 255 ...
So there are two interesting things. First, there is a byte that's
more than 0x3f in
eax, and it's the green byte, which explains the
green color I get when testing in QEMU with FreeDOS. Third,
0xff, not 0; this explains why all of the palette entries get set,
instead of only three-quarters of them. (Also, it means the invisible
byte, the one to be skipped, will be the one in
al at the end of the
fourth, eighth, etc., iterations of the loop, not the first, fifth,
etc. This is taken into account in my naming of the bytes above.)
Unfortunately, around this point I screwed up in GDB and had to start QEMU again and reattach GDB, and then redelete some old breakpoints.
(gdb) c Continuing. Watchdog has expired. Target detached. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 0x0000026f in ?? () (gdb) display/2i $cs*16+$eip 2: x/2i $cs * 16 + $eip 0x96f: mov 1(%bx),%al 0x972: mov 13(%bx),%ah (gdb) b *0x22f40 Note: breakpoint 5 also set at pc 0x22f40. Breakpoint 6 at 0x22f40 (gdb) delete 5 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000100 in ?? () 2: x/2i $cs * 16 + $eip 0x22f40: mov $0x13,%al 0x22f42: int $0x10 (gdb) info registers eax 0x400000 4194304 ecx 0xff 255 ... (gdb) p $eax = 0x0 $6 = 0 (gdb) b *0x22f46 Breakpoint 7 at 0x22f46 (gdb) delete 6 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000106 in ?? () 2: x/2i $cs * 16 + $eip 0x22f46: mov $0x3c9,%dx 0x22f49: mov $0x3,%ch (gdb) info registers eax 0x20cd 8397 ecx 0xff 255 ... (gdb)
ecx is 255 upon program entry, and it's not the BIOS that's
setting %eax after all; it's the
lds instruction, which loads the
int 20h instruction from the beginning of the PSP into it! So red
starts out at 0x20, and the invisible byte starts out at 0xcd, the
int opcode, which is above 0x3f, so that's how the incrementation
Here's the main loop, minus the exit test at the end, which just jumps back to the beginning:
1f: 89 f8 mov ax,di 21: 31 d2 xor dx,dx 23: bb 40 01 mov bx,0x140 26: f7 f3 div bx 28: 01 c8 add ax,cx 2a: 01 ca add dx,cx 2c: 21 d0 and ax,dx 2e: c1 e8 03 shr ax,0x3 31: d3 e8 shr ax,cl 33: 00 05 add BYTE PTR [di],al 35: 47 inc di 36: 75 e7 jne 0x1f 38: 41 inc cx
So the main thing to notice here is that there's only one memory
access here, and it's through the
di register; and
di changes only
by being incremented. So we're writing to each pixel on the screen
ds is set to the video segment a000h early
This loop is almost the same as the one in the 31-byte version. There are two differences:
addrather than an
shrby a constant 3.
Let's take a look at the last bit first:
35: 47 inc di 36: 75 e7 jne 0x1f 38: 41 inc cx
On entry to this loop,
cx is zero; we know this because we just fell
out of the bottom of a
loop instruction, which decrements
jumps unless it decremented it to zero. But every time
di hits zero
(that is, wraps around to the top of the screen) we increment
again. There are no other places in the loop that change
cx. So it
forms a kind of frame counter.
The first little bit:
1f: 89 f8 mov ax,di 21: 31 d2 xor dx,dx 23: bb 40 01 mov bx,0x140 26: f7 f3 div bx
has a helpful comment in the source:
0x140 is 320, the number of pixels in a row on the mode 13h screen,
and since there's one byte per pixel, also the number of bytes. So if
di points at a pixel, dividing
di by 320 should give us the
Y-coordinate as the quotient and the X-coordinate as the remainder;
and that's what the comment means.
The next few instructions are where the black magic happens:
28: 01 c8 add ax,cx 2a: 01 ca add dx,cx 2c: 21 d0 and ax,dx 2e: c1 e8 03 shr ax,0x3 31: d3 e8 shr ax,cl 33: 00 05 add BYTE PTR [di],al
The first two
add instructions essentially shift the coordinates
up and to the left by one pixel every frame. If you take them out,
you get a much flatter-looking picture, one that looks like this:
shr is also kind of optional. It throws away the
bottom three bits of coordinate-derived data, which means that the
screen is divided into a bunch of 8x8 tiles, and there are no
differences introduced inside those tiles (although the per-frame
shift can introduce some). Without the
adds and this extra
you get something that looks mostly like a bunch of particularly
colorful Sierpinski triangles.
shr also slows things down a bit; without it, in some
pixels (e.g. (255, 255)), you can get all the way to palette value 255
in a single frame!
So these lines are kind of the heart of the hack:
2c: 21 d0 and ax,dx 31: d3 e8 shr ax,cl 33: 00 05 add BYTE PTR [di],al
With them in there, you get most of the visual and temporal features of the original.
You'll notice that the screenshots all appear to be made up of nested
2x2 squares in which the bottom right quadrant of each square is
brighter than the other three quadrants. That's kind of what you'd
expect if you
and your coordinates together, right? The lower right
quadrant of each 2x2 square is the one where both of the coordinate
bits distinguishing the quadrants of that square are 1.
That's pretty much the pattern you get if you just do this:
2c: 21 d0 and ax,dx 33: 00 05 mov BYTE PTR [di],al
Which looks like this, and doesn't animate:
I don't know how to completely explain the difference between that and
the previous picture. Obviously the frame counter is crucial in
actually getting an animation, and during the long period of time when
cl is counting up from 8 to 256, the animation essentially pauses;
but I don't quite know how to explain the difference it makes.
That's the main part of my understanding of Klappquadrat I'm still not yet happy with.
39: e4 60 in al,0x60 3b: fe c8 dec al 3d: 75 e0 jne 0x1f 3f: c3 ret
According to the source for Dirojed, another 32-byte intro, these four
instructions are the "standard ESC check". I guess port 60h reads as
1 if somebody's hitting the Esc key and something else otherwise; so
al sets the zero flag iff Esc was pressed, and if that
was the case, we fall through to the
ret, which pops a zero off the
stack, returns to the
int 20h at the beginning of the PSP, and
terminates the program. About half the time, this crashes FreeDOS in
QEMU; I don't know why.
The 31-byte version just has an unconditional
jmp short schliefe
here instead of this ESC check, so there's no way to exit to DOS.
I hand-translated the
objdump disassembly output into equivalent
gas input to facilitate experimenting with changes to the program.
It produces a byte-for-byte identical executable until you uncomment
some of the lines that make it a little more robust. It looks like
## a copy of T$'s 64-byte "Klappquadrat" ## compile as follows: ## as -R klapquad.s -o klapquad.o ## objcopy -O binary klapquad.o klapquad.com.0 ## dd if=klapquad.com.0 of=klapquad.com bs=256 skip=1 .code16 .org 0x100 ## this makes it work right in FreeDOS in QEMU ## and when run multiple times #xor %eax, %eax movb $0x13, %al int $0x10 #mov $(vidseg-2), %bx # makes it work right in FreeDOS in QEMU lds (%bx), %ax mov $0x3c9, %dx mov $3, %ch # orig. 3 set_palette_loop: ror $8, %eax cmp $0x3f, %al jb dont_threshold mov $0x3f, %al inc %ah dont_threshold: test $3, %cl je dont_outb outb %al, (%dx) dont_outb: loop set_palette_loop schleife: mov %di, %ax xor %dx, %dx mov $0x140, %bx div %bx add %cx, %ax add %cx, %dx and %dx, %ax shr $3, %ax shr %cl, %ax addb %al, (%di) inc %di jne schleife inc %cx in $0x60, %al dec %al jne schleife ret #vidseg: .short 0xa000
There seem to be a limited number of people who can appreciate the assembly version, even with the explanation, and it is a little hard to modify; so I wrote this Python version, which I am placing in the public domain. It probably isn't easier to run (you need to install Python, Numeric, SDL, and Pygame instead of any one of DOSBox, Microsoft Windows, or FreeDOS, and QEMU, and the MS-DOS interfaces are probably stabler) but it's sure easier to read if you know Python and not assembly.
#!/usr/bin/python """Recreation of T$'s Klappquadrat intro in Python with Pygame and Numeric. This is a recreation of the 64-byte version, and I think 31 instructions. By contrast, this is 44 lines of code, about 1600 characters. On the other hand, you can change this from 320x200x256 to, say, 640x480x512 by changing the screensize= and ncolors= lines to say (640, 480) and 512. Kragen Javier Sitaker wrote this recreation, but T$ is to credit for the original intro. """ import pygame, sys from Numeric import zeros, subtract, array, arange, where, take, shape, indices screensize = (320, 200) ncolors = 256 def colors(masks, levels): "Compute a grayscale pixel from bit masks and a floating-point level [0,1)" return sum([int(mask * level) & mask for mask, level in zip(masks, levels)]) def clamp(a, b, c): "Threshold b between lower limit a and upper limit c." d = where(a < b, b, a) return where(d < c, d, c) def redraw(screen, buf, palette, frames): x, y = indices(screensize) # this 256 is not ncolors; it's a timing/pacing thing buf += ((x + frames) & (y + frames)) >> (frames % 256) >> 3 buf %= ncolors pygame.surfarray.blit_array(screen, take(palette, buf)) def main(argv): pygame.init() screen = pygame.display.set_mode(screensize, pygame.FULLSCREEN) buf = zeros(screensize) fiery_rgb_integers = clamp(0, subtract.outer(arange(ncolors) + ncolors/8, ((array([0, 1, 2]) * ncolors) / 4)), ncolors / 4) masks = screen.get_masks()[:3] # I'm not sure this palette is exactly right; it only goes to 63 # in the original... palette = array([colors(masks, levels/float(ncolors/4)) for levels in fiery_rgb_integers]) frames = 0 while 1: ev = pygame.event.poll() if ev.type == pygame.NOEVENT: frames += 1 redraw(screen, buf, palette, frames) pygame.display.flip() elif ev.type == pygame.KEYDOWN: break elif ev.type == pygame.QUIT: break if __name__ == '__main__': main(sys.argv)