Klappquadrat (klapquad.com
) is a 31-byte MS-DOS display hack (a
category known as the "32b intro") written by an entity known as T$.
Packaged in klapquad.zip
is also a 64-byte version quad_64b.com
that makes even nicer graphics. Here's more or less how it looks in
DOSBox:
(video formats: 250kilobyte Ogg Theora or 4 megabyte ZMBV avi)
T$ was kind enough to include the assembly source to the 31-byte version:
;Klappquadrat 32b
;32 byte intro source by T$
;Greets to mados, cthulhu, spacey and neo
org 100h
mov al,13h
int 10h
lds ax,[bx]
schleife:
mov ax,di
xor dx,dx
mov bx,320
div bx
;dx=x, ax=y
add ax,cx
add dx,cx
and ax,dx
shr ax,cl
xor [di],al
inc di
jnz schleife
inc cx
jmp short schleife
The 64-byte version looks fairly similar:
kragen@thrifty:~/pkgs/klapquad$ objdump -D -b binary -m i8086 -M intel quad_64b.com
quad_64b.com: file format binary
Disassembly of section .data:
0000000000000000 <.data>:
0: b0 13 mov al,0x13
2: cd 10 int 0x10
4: c5 07 lds ax,DWORD PTR [bx]
6: ba c9 03 mov dx,0x3c9
9: b5 03 mov ch,0x3
b: 66 c1 c8 08 ror eax,0x8
f: 3c 3f cmp al,0x3f
11: 72 04 jb 0x17
13: b0 3f mov al,0x3f
15: fe c4 inc ah
17: f6 c1 03 test cl,0x3
1a: 74 01 je 0x1d
1c: ee out [dx],al
1d: e2 ec loop 0xb
1f: 89 f8 mov ax,di
21: 31 d2 xor dx,dx
23: bb 40 01 mov bx,0x140
26: f7 f3 div bx
28: 01 c8 add ax,cx
2a: 01 ca add dx,cx
2c: 21 d0 and ax,dx
2e: c1 e8 03 shr ax,0x3
31: d3 e8 shr ax,cl
33: 00 05 add BYTE PTR [di],al
35: 47 inc di
36: 75 e7 jne 0x1f
38: 41 inc cx
39: e4 60 in al,0x60
3b: fe c8 dec al
3d: 75 e0 jne 0x1f
3f: c3 ret
The rest of this file is about 3000 words of my commentary on these 31 instructions. I might be wrong about some things, because I'm pretty ignorant about assembly language and MS-DOS, and this code is a bit clever. I'm mostly doing this to learn some of the tricks in the code.
0: b0 13 mov al,0x13
2: cd 10 int 0x10
4: c5 07 lds ax,DWORD PTR [bx]
This part is in the original source; it's a pretty standard way to
start out a small graphics intro, like something in the 32b or 64b
categories. fr-016 starts the same way, but with a les
instead of a lds
. ax
starts out as 0, and in int 10h
, ah
selects the service. Service 0 is setting the video mode; you specify
the video mode in al
. So the first two instructions set the video
mode to mode 13h, which is the very handy 320x200x256 "MCGA" mode
supported by almost all SuperVGA cards.
See my notes on fr-016 for how the lds
thing works,
although fr-016
uses les
instead of lds
. Setting the ds
register like this makes the program code more or less inaccessible
for data operations.
6: ba c9 03 mov dx,0x3c9
9: b5 03 mov ch,0x3
Note that there's a loop
instruction below that jumps to 0xb
, so
the above two instructions are setup for a loop. Here's the body of
the loop:
b: 66 c1 c8 08 ror eax,0x8
f: 3c 3f cmp al,0x3f
11: 72 04 jb 0x17
13: b0 3f mov al,0x3f
15: fe c4 inc ah
17: f6 c1 03 test cl,0x3
1a: 74 01 je 0x1d
1c: ee out [dx],al
1d: e2 ec loop 0xb
Loops generally need to have some kind of side effect in them to be
useful, and in this case, it looks like the purpose of the loop is the
out
instruction, which writes the byte al
to the port in dx
;
dx
got set in the loop setup code above to 3c9h, and isn't modified
inside the loop, so it's always writing bytes to this same port.
It turns out that this is the port you write bytes to in order to set up the VGA palette, so this loop is there to set up the palette. This 25-byte loop (including the two-instruction setup) is also the major difference between the 64-byte version and the 31-byte version whose source is above.
The normal sequence, according to http://www.brackeen.com/vga/bitmaps.html#5, is that you write the palette index to port 3c8h, then the six-bit red, green, and blue values in sequence to port 3c9h. It says you can load the whole palette by first writing a zero to 3c8h, and then writing all 256 palette entries in sequence to 3c9h. So I hypothesize that in general you can write any number of palette entries in sequence this way, and the (emulated) card happens to default to setting palette entry zero at bootup.
Interestingly, this part of the program seems to work differently in FreeDOS under QEMU than in DOSBox; the palette I get in FreeDOS is the black, red, orange, yellow, white palette you can see in the screenshot above, while the one I see in QEMU ranges from green through yellow to white, with no black. I get a somewhat similar effect, actually, if I run this program twice inside DOSBox, but it's cyan instead of green.
This suggests that the problem is the initial state of the eax
register, and indeed if I insert an xor eax, eax
at the beginning
of the program, it displays more correctly in FreeDOS in QEMU, and can
run more than once without screwing up the colors in DOSBox.
So, anyway, how does this loop produce this sequence of colors?
Presumably ax
starts out as whatever the BIOS video mode routine
leaves in it. Each time through the loop, we rotate it by 8 bits, so
every four times through the loop, it will be rotated back to its
original position. The two bottom bits of the loop counter are tested
by the test
instruction, and one time out of four, we skip the out
instruction, so one of the four bytes in eax
is invisible.
This loop seems to set all 256 palette entries. At first that was
what I expected it to do, since that would be 3 * 256 color
components, and we set cx
to 3 * 256 by setting ch
to 3. But then
I thought that actually cx
gets decremented by the loop
instruction even when we didn't output a color component, so I thought
it would only set the first 192 colors. But if I increase ch
to 4,
then the first many colors all get set to white, I guess because it
wrapped around.
So, anyway, what were those colors? Here's the code that computes them.
b: 66 c1 c8 08 ror eax,0x8
f: 3c 3f cmp al,0x3f
11: 72 04 jb 0x17
13: b0 3f mov al,0x3f
15: fe c4 inc ah
So first it rotates eax
by a byte; then it checks to see if the low
byte is greater than or equal to 0x3f, which is 63, the largest
six-bit value. If so, it thresholds it to 0x3f and increments ah
,
the next byte up. So when red reaches its max, green starts to
increment, when green reaches its max, blue starts to increment; when
blue reaches its max, the invisible byte starts to increment; and when
the invisible byte reaches its max, red starts to increment. So, if
we start with some large number in the invisible byte, some small
number in the red byte, and zeroes elsewhere, this will give us the
dark red, light red, orange, yellow, white sequence that we see.
I seem to remember that this is what's called the "Dutch palette", because so many demos from the Netherlands used it for a while.
I was surprised to learn that you could use eax
in 16-bit real mode
like this. Apparently the operand size prefix 0x66 works in 16-bit
mode to give you 32-bit operands just as it works in 32-bit mode to
give you 16-bit operands.
So the remaining mysteries to me at this point were:
Why do all 256 palette entries get set, and not just 192 of them?
Where does the large number in the invisible byte (I guessed it's
ah
on entry to the loop, but I was wrong) come from? It must be
something the BIOS int 10h
call is sticking there, I incorrectly
thought, but what is it and what does it mean?
To answer these questions, I resorted to probing QEMU with GDB.
First, in QEMU's console, while running quad_64b.com
:
(qemu) info registers
EAX=3f3f0000 EBX=...
...
ES =22e4 ...
CS =22e4 ...
So I can set a breakpoint at 0x22f40 and get it to stop on program
entry. GDB is kind of dumb when controlling QEMU in real mode; it can
set breakpoints, and it does stop when its breakpoints get hit, but it
can't figure out that it stopped because the breakpoint was hit, so
you have to manually remove the breakpoint if you want to continue,
and stepi
doesn't work. Anyway, so I restarted QEMU and, before
restarting quad_64b
, I said:
(qemu) gdbserver
(qemu)
And then I ran GDB:
kragen@thrifty:~/devel/circles$ gdb
GNU gdb 6.4.90-debian
...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000e830 in ?? ()
(gdb) set architecture i8086
The target architecture is assumed to be i8086
(gdb) b *0x22f40
Breakpoint 1 at 0x22f40
(gdb) c
Continuing.
And then I started the program:
A:\> quad_64b
And GDB woke up:
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000100 in ?? ()
(gdb) x/40i $cs*16+$eip
0x22f40: mov $0x13,%al
0x22f42: int $0x10
0x22f44: lds (%bx),%ax
0x22f46: mov $0x3c9,%dx
0x22f49: mov $0x3,%ch
0x22f4b: ror $0x8,%eax
...
So I set breakpoints after the return from the interrupt, and at the top of the loop body, and deleted the breakpoint where I was:
(gdb) b *0x22f44
Breakpoint 2 at 0x22f44
(gdb) b *0x22f4b
Breakpoint 3 at 0x22f4b
(gdb) delete 1
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000104 in ?? ()
(gdb) info registers
eax 0x400020 4194336
ecx 0xff 255
...
So there are two interesting things. First, there is a byte that's
more than 0x3f in eax
, and it's the green byte, which explains the
green color I get when testing in QEMU with FreeDOS. Third, ecx
is
0xff, not 0; this explains why all of the palette entries get set,
instead of only three-quarters of them. (Also, it means the invisible
byte, the one to be skipped, will be the one in al
at the end of the
fourth, eighth, etc., iterations of the loop, not the first, fifth,
etc. This is taken into account in my naming of the bytes above.)
Unfortunately, around this point I screwed up in GDB and had to start QEMU again and reattach GDB, and then redelete some old breakpoints.
(gdb) c
Continuing.
Watchdog has expired. Target detached.
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000026f in ?? ()
(gdb) display/2i $cs*16+$eip
2: x/2i $cs * 16 + $eip
0x96f: mov 1(%bx),%al
0x972: mov 13(%bx),%ah
(gdb) b *0x22f40
Note: breakpoint 5 also set at pc 0x22f40.
Breakpoint 6 at 0x22f40
(gdb) delete 5
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000100 in ?? ()
2: x/2i $cs * 16 + $eip
0x22f40: mov $0x13,%al
0x22f42: int $0x10
(gdb) info registers
eax 0x400000 4194304
ecx 0xff 255
...
(gdb) p $eax = 0x0
$6 = 0
(gdb) b *0x22f46
Breakpoint 7 at 0x22f46
(gdb) delete 6
(gdb) c
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000106 in ?? ()
2: x/2i $cs * 16 + $eip
0x22f46: mov $0x3c9,%dx
0x22f49: mov $0x3,%ch
(gdb) info registers
eax 0x20cd 8397
ecx 0xff 255
...
(gdb)
So ecx
is 255 upon program entry, and it's not the BIOS that's
setting %eax after all; it's the lds
instruction, which loads the
int 20h
instruction from the beginning of the PSP into it! So red
starts out at 0x20, and the invisible byte starts out at 0xcd, the
int
opcode, which is above 0x3f, so that's how the incrementation
gets started.
Here's the main loop, minus the exit test at the end, which just jumps back to the beginning:
1f: 89 f8 mov ax,di
21: 31 d2 xor dx,dx
23: bb 40 01 mov bx,0x140
26: f7 f3 div bx
28: 01 c8 add ax,cx
2a: 01 ca add dx,cx
2c: 21 d0 and ax,dx
2e: c1 e8 03 shr ax,0x3
31: d3 e8 shr ax,cl
33: 00 05 add BYTE PTR [di],al
35: 47 inc di
36: 75 e7 jne 0x1f
38: 41 inc cx
So the main thing to notice here is that there's only one memory
access here, and it's through the di
register; and di
changes only
by being incremented. So we're writing to each pixel on the screen
sequentially (remember, ds
is set to the video segment a000h early
on.).
This loop is almost the same as the one in the 31-byte version. There are two differences:
add
rather than an xor
;
shr
by a constant 3.
Let's take a look at the last bit first:
35: 47 inc di
36: 75 e7 jne 0x1f
38: 41 inc cx
On entry to this loop, cx
is zero; we know this because we just fell
out of the bottom of a loop
instruction, which decrements cx
and
jumps unless it decremented it to zero. But every time di
hits zero
(that is, wraps around to the top of the screen) we increment cx
again. There are no other places in the loop that change cx
. So it
forms a kind of frame counter.
The first little bit:
1f: 89 f8 mov ax,di
21: 31 d2 xor dx,dx
23: bb 40 01 mov bx,0x140
26: f7 f3 div bx
has a helpful comment in the source:
;dx=x, ax=y
0x140 is 320, the number of pixels in a row on the mode 13h screen,
and since there's one byte per pixel, also the number of bytes. So if
di
points at a pixel, dividing di
by 320 should give us the
Y-coordinate as the quotient and the X-coordinate as the remainder;
and that's what the comment means.
The next few instructions are where the black magic happens:
28: 01 c8 add ax,cx
2a: 01 ca add dx,cx
2c: 21 d0 and ax,dx
2e: c1 e8 03 shr ax,0x3
31: d3 e8 shr ax,cl
33: 00 05 add BYTE PTR [di],al
The first two add
instructions essentially shift the coordinates
up and to the left by one pixel every frame. If you take them out,
you get a much flatter-looking picture, one that looks like this:
The three-bit shr
is also kind of optional. It throws away the
bottom three bits of coordinate-derived data, which means that the
screen is divided into a bunch of 8x8 tiles, and there are no
differences introduced inside those tiles (although the per-frame
shift can introduce some). Without the add
s and this extra shr
,
you get something that looks mostly like a bunch of particularly
colorful Sierpinski triangles.
The extra shr
also slows things down a bit; without it, in some
pixels (e.g. (255, 255)), you can get all the way to palette value 255
in a single frame!
So these lines are kind of the heart of the hack:
2c: 21 d0 and ax,dx
31: d3 e8 shr ax,cl
33: 00 05 add BYTE PTR [di],al
With them in there, you get most of the visual and temporal features of the original.
You'll notice that the screenshots all appear to be made up of nested
2x2 squares in which the bottom right quadrant of each square is
brighter than the other three quadrants. That's kind of what you'd
expect if you and
your coordinates together, right? The lower right
quadrant of each 2x2 square is the one where both of the coordinate
bits distinguishing the quadrants of that square are 1.
That's pretty much the pattern you get if you just do this:
2c: 21 d0 and ax,dx
33: 00 05 mov BYTE PTR [di],al
Which looks like this, and doesn't animate:
I don't know how to completely explain the difference between that and
the previous picture. Obviously the frame counter is crucial in
actually getting an animation, and during the long period of time when
cl
is counting up from 8 to 256, the animation essentially pauses;
but I don't quite know how to explain the difference it makes.
That's the main part of my understanding of Klappquadrat I'm still not yet happy with.
39: e4 60 in al,0x60
3b: fe c8 dec al
3d: 75 e0 jne 0x1f
3f: c3 ret
According to the source for Dirojed, another 32-byte intro, these four
instructions are the "standard ESC check". I guess port 60h reads as
1 if somebody's hitting the Esc key and something else otherwise; so
decrementing al
sets the zero flag iff Esc was pressed, and if that
was the case, we fall through to the ret
, which pops a zero off the
stack, returns to the int 20h
at the beginning of the PSP, and
terminates the program. About half the time, this crashes FreeDOS in
QEMU; I don't know why.
The 31-byte version just has an unconditional jmp short schliefe
here instead of this ESC check, so there's no way to exit to DOS.
gas
VersionI hand-translated the objdump
disassembly output into equivalent
gas
input to facilitate experimenting with changes to the program.
It produces a byte-for-byte identical executable until you uncomment
some of the lines that make it a little more robust. It looks like
this:
## a copy of T$'s 64-byte "Klappquadrat"
## compile as follows:
## as -R klapquad.s -o klapquad.o
## objcopy -O binary klapquad.o klapquad.com.0
## dd if=klapquad.com.0 of=klapquad.com bs=256 skip=1
.code16
.org 0x100
## this makes it work right in FreeDOS in QEMU
## and when run multiple times
#xor %eax, %eax
movb $0x13, %al
int $0x10
#mov $(vidseg-2), %bx # makes it work right in FreeDOS in QEMU
lds (%bx), %ax
mov $0x3c9, %dx
mov $3, %ch # orig. 3
set_palette_loop:
ror $8, %eax
cmp $0x3f, %al
jb dont_threshold
mov $0x3f, %al
inc %ah
dont_threshold:
test $3, %cl
je dont_outb
outb %al, (%dx)
dont_outb:
loop set_palette_loop
schleife:
mov %di, %ax
xor %dx, %dx
mov $0x140, %bx
div %bx
add %cx, %ax
add %cx, %dx
and %dx, %ax
shr $3, %ax
shr %cl, %ax
addb %al, (%di)
inc %di
jne schleife
inc %cx
in $0x60, %al
dec %al
jne schleife
ret
#vidseg: .short 0xa000
There seem to be a limited number of people who can appreciate the assembly version, even with the explanation, and it is a little hard to modify; so I wrote this Python version, which I am placing in the public domain. It probably isn't easier to run (you need to install Python, Numeric, SDL, and Pygame instead of any one of DOSBox, Microsoft Windows, or FreeDOS, and QEMU, and the MS-DOS interfaces are probably stabler) but it's sure easier to read if you know Python and not assembly.
#!/usr/bin/python
"""Recreation of T$'s Klappquadrat intro in Python with Pygame and Numeric.
This is a recreation of the 64-byte version, and I think 31
instructions. By contrast, this is 44 lines of code, about 1600
characters. On the other hand, you can change this from 320x200x256
to, say, 640x480x512 by changing the screensize= and ncolors= lines to
say (640, 480) and 512.
Kragen Javier Sitaker wrote this recreation, but T$ is to credit for
the original intro.
"""
import pygame, sys
from Numeric import zeros, subtract, array, arange, where, take, shape, indices
screensize = (320, 200)
ncolors = 256
def colors(masks, levels):
"Compute a grayscale pixel from bit masks and a floating-point level [0,1)"
return sum([int(mask * level) & mask for mask, level in zip(masks, levels)])
def clamp(a, b, c):
"Threshold b between lower limit a and upper limit c."
d = where(a < b, b, a)
return where(d < c, d, c)
def redraw(screen, buf, palette, frames):
x, y = indices(screensize)
# this 256 is not ncolors; it's a timing/pacing thing
buf += ((x + frames) & (y + frames)) >> (frames % 256) >> 3
buf %= ncolors
pygame.surfarray.blit_array(screen, take(palette, buf))
def main(argv):
pygame.init()
screen = pygame.display.set_mode(screensize, pygame.FULLSCREEN)
buf = zeros(screensize)
fiery_rgb_integers = clamp(0, subtract.outer(arange(ncolors) + ncolors/8,
((array([0, 1, 2]) * ncolors)
/ 4)),
ncolors / 4)
masks = screen.get_masks()[:3]
# I'm not sure this palette is exactly right; it only goes to 63
# in the original...
palette = array([colors(masks, levels/float(ncolors/4))
for levels in fiery_rgb_integers])
frames = 0
while 1:
ev = pygame.event.poll()
if ev.type == pygame.NOEVENT:
frames += 1
redraw(screen, buf, palette, frames)
pygame.display.flip()
elif ev.type == pygame.KEYDOWN: break
elif ev.type == pygame.QUIT: break
if __name__ == '__main__': main(sys.argv)