/* A simple virtual machine based on the abjunction boolean function. The idea is that you write “circuits” in the form of negated implications; these “circuits” have some number of input bits, and they are built by a sequence of IMP *instructions* on substrings of their already-computed bits to add more bits. In this way you can compute arbitrary functions (of finite-sized data) in a reasonably efficient way with an extremely simple virtual machine. “negated implication”? “abjunction”? ------------------------------------ NAND is not the only universal binary boolean gate. One of the other five is abj, for “abjunction”, sometimes called “material nonimplication”, or in set theory, “set subtraction” or “relative complement”; X implies Y unless X is true and Y is false, so X not-implies Y only if X is true and Y is false. I’ll use the \ character here to represent it: X Y X \ Y 0 0 0 0 1 0 1 0 1 1 1 0 You can see this as an X-enabled NOT gate for Y: when it's disabled by X being 0, the output is 0, and when X is 1, the output is NOT Y. This is very similar to a NAND, which you can view as a X-controlled NOT gate for Y which outputs 1 when it's disabled instead of 0. If you invert its output, which you can do by \ing it with a constant 1, you get IMP: X Y Z = X \ Y IMP = 1 \ Z 0 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 IMP can be considered a Y-enabled NOT gate for X, even more similar to a NAND. It’s 1 when Y is 1, and NOT X when Y is 0. So by inverting Y we get NAND: X Y NY = 1 \ Y AND = X \ NY NAND = 1 \ Z 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0 So \ is clearly universal as long as you have access to a constant 1: three \s make a NAND, and with NAND you can make anything, so at worst it’s three times as many bit operations as NAND. (NAND can live without the constant 1, although that doesn’t matter in real life.) Usually, though, instead of being three times worse, it’s the same as NAND or a little better; the \ gate is more expressive in that X \ Y is different from Y \ X. So you can get four different nontrivial truth tables (X \ Y, Y \ X, not X, not Y) out of a single gate, compared to NAND’s three (X NAND Y, not X, not Y). For XOR, it’s slightly worse; as far as I know, you need five \ gates (with four propagation delays) instead of the four (with three) you need with NAND: X Y Z = X \ Y A = Y \ X NA = 1 \ A B = NA \ Z XOR = 1 \ B 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 0 On the other hand, you can see that the B column above is XNOR in only four gates (with three propagation delays), rather than the five (with four) you need with NAND. The other nontrivial fundamental Boolean functions in \ are: X OR Y = X \ (X \ Y) X NOR Y = (1 \ Y) \ X X AND Y = 1 \ ((1 \ Y) \ X) NOT X = 1 \ X So XNOR and NOR are simpler, NAND and XOR are more complicated, and everything else is the same as with NAND. \ also has another practical advantage over NAND: SSE has an instruction for it since Katmai, called ANDNPS or `andnot_si128`. In fact, it’s the *only* universal bitwise operation supported in SSE; SSE also has AND, OR, NOT, XOR, and MOV, none of which are universal. noninverting universality ------------------------- Abjunction and reverse abjunction do have one feature unique to them among the six universal Boolean two-input gates: as I mentioned above, they're only universal if you have access to a constant 1, because they are “falsehood-preserving”, never creating truth from pure falsity. In set theory, this is useful in that it never produces an infinite set from finite sets, and there are some kinds of logical devices where the same property is desirable. For example, you can compute abjunction with a single passive relay (one not connected to a power supply, just to its logical inputs), and I suspect that you can compute it with diode logic using Braess’s paradox; and there's a well-known abjunction gate in WireWorld, where it’s impossible for a gate to produce “electrons” from nothing. This property is useful in query languages from set theory because infinite sets are inconvenient in many cases; attempting to iterate over one of them will hang your program. Consider the case of a predicate married(A, B), which is true if A and B are married. If you know only `married(sarah, johann)`, then how are you to answer the query `¬married(sarah, X)` under the closed-world assumption? Clearly X could be `sarah`; `sarah` is not married to herself. But it could also be `jon`, and `rob`, and an infinity of other possibilities. By contrast, the query `parent(X, rob) ∧ ¬married(sarah, X)`, which asks which of `rob`’s parents `sarah` is not married to, provokes no such troublesome infinities as long as `rob` has a finite number of parents. We can gain the ability to express such finite queries, without the infinite queries, by adding only abjunction instead of pure negation, saying `parent(X, rob) \ married(sarah, X)`. Unfortunately, without allowing the infinite universe as an input, abjunction is not very universal! You can still express AND as X \ (X \ Y), but OR is out of reach. “instructions”? --------------- I said this was a virtual machine whose instructions constructed \ gates. Its has a single instruction, with three fields: n, a, and b, and they are executed by calculating the abjunction of two n-bit vectors taken starting at a and b and concatenating that abjunction to the end of the current bit vector. As a special case, when one of the input indices is 0, the operand bitvector is taken to be all 1s, converting abjunction into bitwise complement if it’s the left argument. This means you can’t read the first bit of the bitvector, so you might as well not use it. Because the bit fields are not required to be word-aligned and can be as small as a single bit, bit shifting and combining many bit fields into a single word can be done implicitly. efficiency ---------- This version of abjmachine does bit operations one at a time. Consequently it does only about 52 million bit XOR operations per second, comparable to an IBM PS/2 with a 386 or a late-90s PalmPilot, except slower at multiplication, faster at bitwise operations, and harder to program. This is roughly ten thousand times slower than the underlying machine it’s running on. I hope to soon get it using the bit-parallel operations offered by C to accelerate it by about a factor of 32 in the usual case, and if I can get SSE stuff working on it, a factor of 64. 300 million bit XOR operations per second is roughly a quarter of the speed of the original iPhone (so, think of a Nokia Series 60 phone circa 2002), or similar to a Pentium 66 (circa 1993). Then it would be only about a hundred times slower than the underlying machine, making it comparable to interpreted Python. */ #include #include #include typedef unsigned long word; enum { word_size = CHAR_BIT * sizeof(word) }; /* Stupid one-bit-at-a-time function. */ static inline void set_bit(word *words, int index, int bit) { if (bit) { int shift = index % word_size; int word_index = index / word_size; /* printf("to set 1 at %d, set %d in %d\n", index, shift, word_index); */ words[word_index] |= 1L << shift; } } /* Stupid one-bit-at-a-time function. */ static inline int get_bit(word *words, int index) { int shift = index % word_size; int word_index = index / word_size; return words[word_index] >> shift & 1; } /* Fundamental unit of computation: write a bitstring of length `n_bits`, starting at bit offset `out_off` in bit vector `bits`, computed by the abjunction of the bitstrings of that length starting at `x_off` and `y_off`. */ static inline void do_abj(word *bits, int x_off, int y_off, int out_off, int n_bits) { for (int i = 0; i != n_bits; i++) { set_bit(bits, out_off + i, (x_off ? get_bit(bits, x_off + i) : 1) & !(y_off ? get_bit(bits, y_off + i) : 1)); } } typedef struct { word *buf; int pointer; int words_allocated; } circuit; /* Ensure that enough space is allocated to concatenate n_bits more * bits. */ static inline void reserve_more_bits(circuit *c, int n_bits) { int new_pos = c->pointer + n_bits; int new_words = (new_pos + word_size - 1) / word_size; if (new_words <= c->words_allocated) return; new_words = new_words * 2 + 2; word *new_buf = realloc(c->buf, new_words * sizeof(*new_buf)); if (!new_buf) abort(); for (int i = c->words_allocated; i < new_words; i++) { new_buf[i] = 0; } c->buf = new_buf; c->words_allocated = new_words; } static inline void clear_circuit(circuit *c) { free(c->buf); c->buf = 0; c->words_allocated = 0; c->pointer = 0; } /* Stupid one-bit-at-a-time function. */ static inline void append_bit(circuit *c, int bit) { reserve_more_bits(c, 1); set_bit(c->buf, c->pointer++, bit); } /* Appends `n_bits` bits starting at the beginning of `bits` to the circuit. Useful for providing input. Probably should be refactored into a call to a version that accepts an offset into `bits`. */ static inline void append_bits(circuit *c, word *bits, int n_bits) { /* XXX do this in a less stupid way */ for (int i = 0; i < n_bits; i++) { append_bit(c, get_bit(bits, i)); } } /* Read up to a word’s worth of bits from a circuit, or return 111... in the special case where offset == 0. */ static inline word read_bits(circuit *c, int offset, int n_bits) { if (0 == offset) return (word)(long)-1; word result = 0; for (int i = 0; i < n_bits; i++) { set_bit(&result, i, get_bit(c->buf, offset + i)); } return result; } typedef struct { int x_off, y_off, n_bits; } instruction; static inline void execute(circuit *c, instruction *pc, int n) { for (int i = 0; i < n; i++, pc++) { reserve_more_bits(c, pc->n_bits); /* printf("calculating a %d-bit abj from %d and %d to %d\n", */ /* pc->n_bits, pc->x_off, pc->y_off, c->pointer); */ do_abj(c->buf, pc->x_off, pc->y_off, c->pointer, pc->n_bits); c->pointer += pc->n_bits; } } #define ARRAY_SIZE(x) (sizeof(x)/sizeof(*x)) /* Returns bit offset at which the XOR bits can be read */ static inline int do_xor(circuit *c, int x_off, int y_off, int n_bits) { // Z = X \ Y; A = Y \ X; NA = 1 \ A; B = NA \ Z; XOR = 1 \ B int base = c->pointer; instruction instructions[] = { { x_off, y_off, n_bits }, /* 0: X \ Y */ { y_off, x_off, n_bits }, /* 1: Y \ X */ { 0, base + n_bits, n_bits }, /* 2: 1 \ (Y \ X) */ { base + 2*n_bits, base, n_bits }, /* 3: (1 \ (Y \ X)) \ (X \ Y) */ { 0, base + 3*n_bits, n_bits }, /* 4: 1 \ ((1 \ (Y \ X)) \ (X \ Y)) */ }; execute(c, instructions, ARRAY_SIZE(instructions)); return base + 4*n_bits; } static inline void run_circuit(word a, word b) { circuit c = {0}; for (int i = 0; i < 64; i++) { /* Initial bit to fill the inaccessible bit position 0. */ word empty = -1; append_bits(&c, &empty, 1); int a_pos = c.pointer; append_bits(&c, &a, word_size); int b_pos = c.pointer; append_bits(&c, &b, word_size); int xor_pos = do_xor(&c, a_pos, b_pos, word_size); if (!i) { printf("%d(@%d) ⊗ %d(@%d) = %d(@%d)\n", (int)read_bits(&c, a_pos, word_size), a_pos, (int)read_bits(&c, b_pos, word_size), b_pos, (int)read_bits(&c, xor_pos, word_size), xor_pos); } clear_circuit(&c); } /* for (int i = 0; i < c.words_allocated; i++) { */ /* printf("0x%08x ", (unsigned int)c.buf[i]); */ /* } */ /* printf("\n"); */ } int main(int argc, char **argv) { if (argc != 3) { fprintf(stderr, "Usage: %s 31083 2834\n", argv[0]); return -1; } run_circuit(atoi(argv[1]), atoi(argv[2])); return 0; }