/* A simple virtual machine based on the abjunction boolean function.
The idea is that you write “circuits” in the form of negated
implications; these “circuits” have some number of input bits, and
they are built by a sequence of IMP *instructions* on substrings of
their already-computed bits to add more bits. In this way you can
compute arbitrary functions (of finite-sized data) in a reasonably
efficient way with an extremely simple virtual machine.
“negated implication”? “abjunction”?
------------------------------------
NAND is not the only universal binary boolean gate. One of the
other five is abj, for “abjunction”, sometimes called “material
nonimplication”, or in set theory,
“set subtraction” or “relative complement”;
X implies Y unless X is true and Y is false, so X
not-implies Y only if X is true and Y is false.
I’ll use the \ character here to represent it:
X Y X \ Y
0 0 0
0 1 0
1 0 1
1 1 0
You can see this as an X-enabled NOT gate for Y: when it's disabled
by X being 0, the output is 0, and when X is 1, the output is NOT
Y. This is very similar to a NAND, which you can view as a
X-controlled NOT gate for Y which outputs 1 when it's disabled
instead of 0. If you invert its output, which you can do by
\ing it with a constant 1, you get IMP:
X Y Z = X \ Y IMP = 1 \ Z
0 0 0 1
0 1 0 1
1 0 1 0
1 1 0 1
IMP can be considered a Y-enabled NOT gate for X, even more similar
to a NAND. It’s 1 when Y is 1, and NOT X when Y is 0. So by
inverting Y we get NAND:
X Y NY = 1 \ Y AND = X \ NY NAND = 1 \ Z
0 0 1 0 1
0 1 0 0 1
1 0 1 0 1
1 1 0 1 0
So \ is clearly universal as long as you have access to a
constant 1: three \s make a NAND, and with NAND you can make
anything, so at worst it’s three times as many bit operations as
NAND. (NAND can live without the constant 1, although that doesn’t
matter in real life.)
Usually, though, instead of being three times worse, it’s the same
as NAND or a little better; the \ gate is more expressive in that
X \ Y is different from Y \ X. So you can get four different
nontrivial truth tables (X \ Y, Y \ X, not X, not Y) out of a
single gate, compared to NAND’s three (X NAND Y, not X, not Y).
For XOR, it’s slightly worse; as far as I know, you need five \
gates (with four propagation delays) instead of the four (with
three) you need with NAND:
X Y Z = X \ Y A = Y \ X NA = 1 \ A B = NA \ Z XOR = 1 \ B
0 0 0 0 1 1 0
0 1 0 1 0 0 1
1 0 1 0 1 0 1
1 1 0 0 1 1 0
On the other hand, you can see that the B column above is XNOR in
only four gates (with three propagation delays), rather than the
five (with four) you need with NAND.
The other nontrivial fundamental Boolean functions in \ are:
X OR Y = X \ (X \ Y)
X NOR Y = (1 \ Y) \ X
X AND Y = 1 \ ((1 \ Y) \ X)
NOT X = 1 \ X
So XNOR and NOR are simpler, NAND and XOR are more complicated, and
everything else is the same as with NAND.
\ also has another practical advantage over NAND: SSE has an
instruction for it since Katmai, called ANDNPS or `andnot_si128`.
In fact, it’s the *only* universal bitwise operation supported in
SSE; SSE also has AND, OR, NOT, XOR, and MOV, none of which are
universal.
noninverting universality
-------------------------
Abjunction and reverse abjunction do have one feature unique to
them among the six universal Boolean two-input gates: as I
mentioned above, they're only universal if you have access to a
constant 1, because they are “falsehood-preserving”, never creating
truth from pure falsity. In set theory, this is useful in that it
never produces an infinite set from finite sets, and there are some
kinds of logical devices where the same property is desirable. For
example, you can compute abjunction with a single passive relay
(one not connected to a power supply, just to its logical inputs),
and I suspect that you can compute it with diode logic using
Braess’s paradox; and there's a well-known abjunction gate in
WireWorld, where it’s impossible for a gate to produce “electrons”
from nothing.
This property is useful in query languages from set theory because
infinite sets are inconvenient in many cases; attempting to iterate
over one of them will hang your program. Consider the case of a
predicate married(A, B), which is true if A and B are married. If
you know only `married(sarah, johann)`, then how are you to answer
the query `¬married(sarah, X)` under the closed-world assumption?
Clearly X could be `sarah`; `sarah` is not married to herself. But
it could also be `jon`, and `rob`, and an infinity of other
possibilities.
By contrast, the query `parent(X, rob) ∧ ¬married(sarah, X)`, which
asks which of `rob`’s parents `sarah` is not married to, provokes
no such troublesome infinities as long as `rob` has a finite number
of parents. We can gain the ability to express such finite
queries, without the infinite queries, by adding only abjunction
instead of pure negation, saying `parent(X, rob) \ married(sarah, X)`.
Unfortunately, without allowing the infinite universe as an input,
abjunction is not very universal! You can still express AND as
X \ (X \ Y), but OR is out of reach.
“instructions”?
---------------
I said this was a virtual machine whose instructions constructed
\ gates. Its has a single instruction, with three fields: n, a,
and b, and they are executed by calculating the abjunction of two
n-bit vectors taken starting at a and b and concatenating that
abjunction to the end of the current bit vector.
As a special case, when one of the input indices is 0, the operand
bitvector is taken to be all 1s, converting abjunction into bitwise
complement if it’s the left argument. This means you can’t read
the first bit of the bitvector, so you might as well not use it.
Because the bit fields are not required to be word-aligned and can
be as small as a single bit, bit shifting and combining many bit
fields into a single word can be done implicitly.
efficiency
----------
This version of abjmachine does bit operations one at a time.
Consequently it does only about 52 million bit XOR operations per
second, comparable to an IBM PS/2 with a 386 or a late-90s
PalmPilot, except slower at multiplication, faster at bitwise
operations, and harder to program. This is roughly ten thousand
times slower than the underlying machine it’s running on.
I hope to soon get it using the
bit-parallel operations offered by C to accelerate it by about a
factor of 32 in the usual case, and if I can get SSE stuff working
on it, a factor of 64. 300 million bit XOR operations per second
is roughly a quarter of the speed of the original iPhone (so, think
of a Nokia Series 60 phone circa 2002), or similar to a Pentium 66
(circa 1993). Then it would be only about a hundred times slower
than the underlying machine, making it comparable to interpreted Python.
*/
#include
#include
#include
typedef unsigned long word;
enum { word_size = CHAR_BIT * sizeof(word) };
/* Stupid one-bit-at-a-time function. */
static inline void set_bit(word *words, int index, int bit)
{
if (bit) {
int shift = index % word_size;
int word_index = index / word_size;
/* printf("to set 1 at %d, set %d in %d\n", index, shift, word_index); */
words[word_index] |= 1L << shift;
}
}
/* Stupid one-bit-at-a-time function. */
static inline int get_bit(word *words, int index)
{
int shift = index % word_size;
int word_index = index / word_size;
return words[word_index] >> shift & 1;
}
/* Fundamental unit of computation: write a bitstring of length
`n_bits`, starting at bit offset `out_off` in bit vector `bits`,
computed by the abjunction of the bitstrings of that length
starting at `x_off` and `y_off`.
*/
static inline void do_abj(word *bits,
int x_off,
int y_off,
int out_off,
int n_bits)
{
for (int i = 0; i != n_bits; i++) {
set_bit(bits, out_off + i,
(x_off ? get_bit(bits, x_off + i) : 1)
& !(y_off ? get_bit(bits, y_off + i) : 1));
}
}
typedef struct {
word *buf;
int pointer;
int words_allocated;
} circuit;
/* Ensure that enough space is allocated to concatenate n_bits more
* bits. */
static inline void reserve_more_bits(circuit *c, int n_bits)
{
int new_pos = c->pointer + n_bits;
int new_words = (new_pos + word_size - 1) / word_size;
if (new_words <= c->words_allocated) return;
new_words = new_words * 2 + 2;
word *new_buf = realloc(c->buf, new_words * sizeof(*new_buf));
if (!new_buf) abort();
for (int i = c->words_allocated; i < new_words; i++) {
new_buf[i] = 0;
}
c->buf = new_buf;
c->words_allocated = new_words;
}
static inline void clear_circuit(circuit *c)
{
free(c->buf);
c->buf = 0;
c->words_allocated = 0;
c->pointer = 0;
}
/* Stupid one-bit-at-a-time function. */
static inline void append_bit(circuit *c, int bit)
{
reserve_more_bits(c, 1);
set_bit(c->buf, c->pointer++, bit);
}
/* Appends `n_bits` bits starting at the beginning of `bits` to the
circuit. Useful for providing input. Probably should be
refactored into a call to a version that accepts an offset into
`bits`.
*/
static inline void append_bits(circuit *c, word *bits, int n_bits)
{
/* XXX do this in a less stupid way */
for (int i = 0; i < n_bits; i++) {
append_bit(c, get_bit(bits, i));
}
}
/* Read up to a word’s worth of bits from a circuit, or return
111... in the special case where offset == 0. */
static inline word read_bits(circuit *c, int offset, int n_bits)
{
if (0 == offset) return (word)(long)-1;
word result = 0;
for (int i = 0; i < n_bits; i++) {
set_bit(&result, i, get_bit(c->buf, offset + i));
}
return result;
}
typedef struct { int x_off, y_off, n_bits; } instruction;
static inline void execute(circuit *c, instruction *pc, int n)
{
for (int i = 0; i < n; i++, pc++) {
reserve_more_bits(c, pc->n_bits);
/* printf("calculating a %d-bit abj from %d and %d to %d\n", */
/* pc->n_bits, pc->x_off, pc->y_off, c->pointer); */
do_abj(c->buf, pc->x_off, pc->y_off, c->pointer, pc->n_bits);
c->pointer += pc->n_bits;
}
}
#define ARRAY_SIZE(x) (sizeof(x)/sizeof(*x))
/* Returns bit offset at which the XOR bits can be read */
static inline int do_xor(circuit *c, int x_off, int y_off, int n_bits)
{
// Z = X \ Y; A = Y \ X; NA = 1 \ A; B = NA \ Z; XOR = 1 \ B
int base = c->pointer;
instruction instructions[] = {
{ x_off, y_off, n_bits }, /* 0: X \ Y */
{ y_off, x_off, n_bits }, /* 1: Y \ X */
{ 0, base + n_bits, n_bits }, /* 2: 1 \ (Y \ X) */
{ base + 2*n_bits, base, n_bits }, /* 3: (1 \ (Y \ X)) \ (X \ Y) */
{ 0, base + 3*n_bits, n_bits }, /* 4: 1 \ ((1 \ (Y \ X)) \ (X \ Y)) */
};
execute(c, instructions, ARRAY_SIZE(instructions));
return base + 4*n_bits;
}
static inline void run_circuit(word a, word b)
{
circuit c = {0};
for (int i = 0; i < 64; i++) {
/* Initial bit to fill the inaccessible bit position 0. */
word empty = -1;
append_bits(&c, &empty, 1);
int a_pos = c.pointer;
append_bits(&c, &a, word_size);
int b_pos = c.pointer;
append_bits(&c, &b, word_size);
int xor_pos = do_xor(&c, a_pos, b_pos, word_size);
if (!i) {
printf("%d(@%d) ⊗ %d(@%d) = %d(@%d)\n",
(int)read_bits(&c, a_pos, word_size), a_pos,
(int)read_bits(&c, b_pos, word_size), b_pos,
(int)read_bits(&c, xor_pos, word_size), xor_pos);
}
clear_circuit(&c);
}
/* for (int i = 0; i < c.words_allocated; i++) { */
/* printf("0x%08x ", (unsigned int)c.buf[i]); */
/* } */
/* printf("\n"); */
}
int main(int argc, char **argv)
{
if (argc != 3) {
fprintf(stderr, "Usage: %s 31083 2834\n", argv[0]);
return -1;
}
run_circuit(atoi(argv[1]), atoi(argv[2]));
return 0;
}