“The theory of n-ary relations being so agreeably complete and satisfying as a basis for well-designed database languages, why would anybody nowadays think of elevating the binary relation in particular to some special position?” — Hugh Darwen, 2005
Here’s my problem, expressed in Prolog. I have two basic relations to start with:
visits(Page, Column, Row, Route)
, which says that route Route
visits
the map square on page Page
(one of 1..36), in column Column
(one of
'ABCD'), in row Row
(one of 1..6).
square(Page, Column, Row, X, Y)
, which gives the X and Y positions
of the given square in a global coordinate system. Physically
adjacent squares have coordinates differing by 1, even if they are
on different pages.
Here’s an unfactored (and untested) Prolog solution, which is valid Datalog:
visits("9", "b", "4", Route),
square("16", "c", "4", X, Y),
visits(Page, Column, Row, Route),
square(Page, Column, Row, X2, Y2),
X - 1 =< X2, X + 1 >= X2,
Y - 1 =< Y2, Y + 1 >= Y2.
A nicely-factored Prolog solution looks like this. First, some
auxiliary predicates neighbor/2
, visits/2
, and route_between/3
:
near(A, B) :- A - 1 =< B, B =< A + 1.
neighbor((Page, Column, Row), (Page2, Column2, Row2)) :-
square(Page, Column, Row, X, Y),
square(Page2, Column2, Row2, X2, Y2),
near(X, X2), near(Y, Y2).
visits((Page, Column, Row), Route) :- visits(Page, Column, Row, Route).
route_between(A, B, Route) :- visits(A, Route), visits(B, Route).
Then here’s the actual query:
route_between(("9", "b", "4"), Place, Route), neighbor(("16", "c", "4"), Place).
The Prolog data model is basically the relational data model:
relations like visits/4
contain tuples, which are distinguished by
their values.
A few years back I wrote about “mvfs” or “multivalued functions”, which if I had known anything about logic, I would have known were called “binary relations”. It turns out that they offer a somewhat simpler solution to this problem, which looks like this:
succ = (i; 1) sum.
near = succ ~=<, ~succ =<.
neighbor = x near ~x, y near ~y.
connected_by = first route, second route.
connected_to_neighbor_by = (first; second neighbor) connected_by.
And then the query is:
(("9", "b", "4"), ("16", "c", "4")).connected_to_neighbor_by
This is considerably more compact than the Prolog version, although I am not yet sure that it is clearer. The rest of this document explains the query language in some detail.
This is the SQL solution I started from, before running to the Prolog above:
select
distinct a.route
from
visits a
join visits b using (route)
join square c using (page, "column", "row"),
square d
where
a.page = '9' and a."column" = 'b' and a."row" = '4' and
c.x between d.x-1 and d.x+1 and
c.y between d.y-1 and d.y+1 and
d.page = '16' and d."column" = 'c' and d."row" = '4';
It is possible to factor out the neighbor
and route_between
relations into separate SQL views, but the cost in code volume is
horrendous, and it doesn't even help the original query much:
create view neighbor as
select
a.page as page, a."column" as "column", a."row" as "row",
b.page as bpage, b."column" as bcolumn, b."row" as brow
from
square a, square b
where
a.x between b.x-1 and b.x+1
and a.y between b.y-1 and b.y+1;
create view routes_between as
select
a.page as page, a."column" as "column", a."row" as "row",
b.page as bpage, b."column" as bcolumn, b."row" as brow,
route
from
visits a join visits b using (route);
select
distinct a.route
from
routes_between a
join neighbor c using (page, "column", "row")
where
a.bpage = '9' and a.bcolumn = 'b' and a.brow = '4' and
c.bpage = '16' and c."bcolumn" = 'c' and c."brow" = '4';
So I went looking for the notation I needed to make this query simpler, which I have presented an example of in the previous section; it allows me to do everything in this huge glob of SQL in just a few short lines. Here’s a more complete explanation.
Here I explain in three parallel notations what I am talking about:
Common examples of binary relations include “is greater than”, “is a parent of”, “is the cosine of”, and “is the first name of”.
A binary relation has a “source” domain (sometimes called just its
“domain”) and a “target” domain (sometimes called its “codomain”). In
the case of "is the first name of", the source domain is names and the
target domain is people. In the case that the binary relation is a
function, the source domain is the function’s domain, and the target
domain is the function’s range. The statement that the relation R
holds between the source-domain element a and the target-domain
element b, is normally written (e.g. in An Algebra of Programming)
b R a. For any pair of elements (a, b) from the relevant domains,
b R a is either true or false. In Prolog notation, I would normally
write this r(a, b)
, but in order to make my Prolog definitions below
make sense, I will write it rel(r, a, b).
. Note that the order of
a and b is reversed here from the mathematical notation.
You can think of the relation as a set of ordered pairs (a, b) where a is an element of the source domain and b is an element of the target domain.
You can also think of a relation as a multivalued attribute: b R a means that among the Rs of a is b; b is an R of a. For example, barackObama parent sashaObama means that barackObama is a parent of sashaObama. Under this interpretation, the relation represents a set-valued property of every object in its source domain.
In my ASCII syntax, I will write barackObama parent sashaObama as
sashaObama.parent = barackObama
. This may be misleading, because it
is simultaneously true that sashaObama.parent = michelleObama
. If
you have a better notation, write me.
I’m going to use four fundamental operations to build up binary relations from simpler ones: composition, converse (which I used to call “inverse”), intersection (normally called “meet”), and relational product (which I used to call “join”).
The composition of the operations R and S is normally written R • S. b (R • S) c iff ∃a: b R a ∧ a S c. In Prolog, the arguments are reversed:
rel(compose(S, R), C, B) :- rel(S, C, A), rel(R, A, B).
So, for example, if it is the case that "Barack" firstName barackObama and barackObama parent sashaObama, then it is the case that "Barack" (firstName • parent) sashaObama. In Prolog, because:
rel(firstName, barackObama, "Barack").
rel(parent, sashaObama, barackObama).
we can conclude:
rel(compose(parent, firstName), sashaObama, "Barack").
In my ASCII notation, this relation is written simply with
concatenation, just as parent firstName
, so sashaObama.(parent
firstName) = "Barack"
. We can think of parent firstName
as being a
binary relation containing just the pair (sashaObama
, "Barack"
),
given only the pairs listed above in the firstName
and parent
relations.
The converse of a relation R is written R°; a R° b iff b R a. For functions, this is the inverse of the function, but the operation is closed on binary relations — the converse of a binary relation is always a binary relation — while it’s not closed on functions, because the inverse of a function may be a binary relation that isn’t a function.
In Prolog:
rel(converse(R), B, A) :- rel(R, A, B).
In my ASCII notation, I write it with ~
. Given the parent
relation mentioned earlier, we can write the child
relation as
~parent
. If these are true:
sashaObama.parent = barackObama.
sashaObama.parent = michelleObama.
Then these are also true:
barackObama.~parent = sashaObama.
michelleObama.~parent = sashaObama.
If we use =
to give names to relations as well, we can write:
sashaObama.parent = barackObama.
sashaObama.parent = michelleObama.
child = ~parent.
Then these are true:
barackObama.child = sashaObama.
michelleObama.child = sashaObama.
This also allows us to define the “sibling” relation:
sibling = parent ~parent.
This is syntactic sugar for sibling = parent° • parent or
rel(sibling, A, B) :- rel(compose(parent, converse(parent)), A, B)
.
The intersection of R and S, sometimes known as the “meet” of R and S (since binary relations form a lattice), is written R ∩ S, and contains just the pairs that are in both R and S: a (R ∩ S) b iff a R b ∧ a S b.
In Prolog, we can write:
rel(intersect(R, S), B, A) :- rel(R, B, A), rel(S, B, A).
In my ASCII notation, I write intersection with a comma, analogously
to Prolog: R, S
.
Suppose we have an older
relation such that X.older = Y
whenever
Y
is older than X
. Given intersection, we can define an
olderSibling
relation:
olderSibling = sibling, older.
(Or, equivalently, older, sibling
.)
This definition means olderSibling = sibling ∩ older or
rel(olderSibling, B, A) :- rel(sibling, B, A), rel(older, B, A)
.
So if we know
sashaObama.older = maliaAnnObama.
maliaAnnObama.parent = barackObama.
sashaObama.parent = barackObama.
we can conclude
sashaObama.olderSibling = maliaAnnObama.
which is an abbreviation for
sashaObama.(parent ~parent, older) = maliaAnnObama.
(I’m considering the possibility that I should write sibling, older
as sibling == older
or sibling & older
instead.)
The fourth major operation on binary relations is the “relational product”, traditionally written with angle brackets and commas. This is an N-ary operation on any number of binary relations, and it constructs a relation whose target domain consists of tuples.
It is defined (for the two-element case) as (rb, sb) <R, S> b iff rb R b ∧ sb S b. In Prolog, for the general case:
rel(product([]), _, []).
rel(product([R|Rs]), B, [RB|RsB]) :-
rel(R, B, RB), rel(product(Rs), B, RsB).
In my ASCII notation I will write it with ;
, as R; S
, with
enclosing parentheses to disambiguate when necessary. As with any
N-ary operator, this is potentially confusing: (R; S; T)
defines a
relation whose target domain is 3-tuples, while ((R; S); T)
, which
looks like it ought to be the same thing, defines a relation whose
target domain is 2-tuples. The same problem is confronted by
function-call syntax in Algol-family programming languages, in which
f((x1, y1), (x2, y2))
is (if meaningful) different from f(x1, y1,
x2, y2)
.
The sibling
operator defined earlier is fairly non-specific, also
including half-siblings. For example, given the facts:
barackObama.parent = annDunham.
barackObama.parent = barackObamaSr.
mayaSoetoroNg.parent = annDunham.
mayaSoetoroNg.parent = loloSoetoro.
our previous sibling
relation, defined as parent ~parent
, will
report that
barackObama.sibling = mayaSoetoroNg.
mayaSoetoroNg.sibling = barackObama.
If we want a stricter definition of sibling
that only connects
people who share both a father and a mother, we can do it as follows:
parents = father; mother.
sibling = parents ~parents.
which is syntactic sugar for the Prolog:
rel(parents, B, A) :- rel(product([father, mother], B, A).
rel(sibling, B, A) :- rel(compose(parents, converse(parents)), B, A).
or the traditional-notation parents = <father, mother> and sibling = parents° • parents.
If we then write:
barackObama.mother = annDunham.
barackObama.father = barackObamaSr.
mayaSoetoroNg.mother = annDunham.
mayaSoetoroNg.father = loloSoetoro.
then we will find
barackObama.parents = (barackObamaSr, annDunham).
mayaSoetoroNg.parents = (loloSoetoro, annDunham).
(I’m skipping over a couple of pitfalls here. If you want to define
parent
in terms of father
and mother
you need union, which I’m
not describing in this document, and if you want to define father
and mother
in terms of parent
and some gender
relation, you need
a way to introduce constants into expressions defining binary
relations, which I haven’t gotten to yet.)
This relational product corresponds in some sense to an equijoin in the relational algebra, with the join key being the source domain of both relations.
Originally I was going to write the relational product as [R; S; T]
,
but Stefan Ljungstrand told me that was confusingly similar to the
notation for categorical coproduct [R, S, T]
, which is sort of the
opposite.
I’ve defined the ASCII syntax pretty informally so far, and I don’t intend to stop now. But I thought I should list a couple of notes about the grammar.
Parsing precedence is converse, composition, intersection, then
relational product; so the expression a, ~b c; d
is parsed as (a,
((~b) c); d
.
=
has two different meanings, one when defining relations and one
when listing particular facts. The syntax for listing a particular
fact is A.B = C.
, where A
and C
are values, and B
is a
binary relation. Commas have meanings inside of both values and
binary relations, but the meanings are unrelated.
To formulate my example query above, I also need a way to do some
arithmetic. I’ll define relations called sum
and =<
as follows.
Sum is defined as a + b sum (a, b), or
(a, b).sum = a + b.
or
rel(sum, [A, B], C) :- var(A), not(var(B)), not(var(C)), A is C - B.
rel(sum, [A, B], C) :- not(var(A)), var(B), not(var(C)), B is C - A.
rel(sum, [A, B], C) :- not(var(A)), not(var(B)), var(C), C is A + B.
rel(sum, [A, B], C) :- not(var(A)), not(var(B)), not(var(C)), C is A + B.
=<
is defined as follows: b =< a iff a ≤ b (note that the
direction there is confusing!), or
rel(=<, A, B) :- A =< B.
or a.=< = b
whenever a
≤ b
. (Note that this kind of hairifies
the informally defined grammar of the query language.)
Finally, I’ll decree that a numeric constant such as 3 will be treated as a
relation that maps anything to that constant: 3 3 x or x.3 = 3
for
any numeric constant 3 and any value x:
rel(N, _, N) :- number(N).
and define a primitive universal identity, or equality, relation i
:
x.i = x.
or
rel(i, X, X).
(maybe this should be called self
?) and define first
and second
predicates for destructuring ordered pairs:
(a, b).first = a.
(a, b).second = b.
or
rel(first, [A, _], A).
rel(second, [_, B], B).
Here is the definition of the basic operators and primitive relations, aggregated from the above sections:
:- multifile rel/3. % because there are relational facts to add later
rel(compose(S, R), C, B) :- rel(S, C, A), rel(R, A, B).
rel(converse(R), B, A) :- rel(R, A, B).
rel(intersect(R, S), B, A) :- rel(R, B, A), rel(S, B, A).
rel(product([]), _, []).
rel(product([R|Rs]), B, [RB|RsB]) :-
rel(R, B, RB), rel(product(Rs), B, RsB).
rel(sum, [A, B], C) :- var(A), not(var(B)), not(var(C)), A is C - B.
rel(sum, [A, B], C) :- not(var(A)), var(B), not(var(C)), B is C - A.
rel(sum, [A, B], C) :- not(var(A)), not(var(B)), var(C), C is A + B.
rel(sum, [A, B], C) :- not(var(A)), not(var(B)), not(var(C)), C is A + B.
rel(=<, A, B) :- A =< B.
rel(N, _, N) :- number(N).
rel(i, X, X).
rel(first, [A, _], A).
rel(second, [_, B], B).
And here is a simple application, aggregated from the examples in the above sections:
% Note that in Prolog we get union for free!
rel(parent, A, B) :- rel(father, A, B).
rel(parent, A, B) :- rel(mother, A, B).
rel(parents, B, A) :- rel(product([father, mother]), B, A).
rel(sibling, B, A) :- rel(compose(parents, converse(parents)), B, A).
rel(child, B, A) :- rel(converse(parent), B, A).
rel(atLeastHalfSibling, B, A) :- rel(compose(parent, child), B, A).
rel(olderSibling, B, A) :- rel(intersect(sibling, older), B, A).
rel(firstName, barackObama, "Barack").
rel(father, sashaObama, barackObama).
rel(mother, sashaObama, michelleObama).
rel(older, sashaObama, maliaAnnObama).
rel(father, maliaAnnObama, barackObama).
rel(mother, maliaAnnObama, michelleObama).
rel(father, barackObama, barackObamaSr).
rel(mother, barackObama, annDunham).
rel(father, mayaSoetoroNg, loloSoetoro).
rel(mother, mayaSoetoroNg, annDunham).
Here are some queries:
rel(parent, maliaAnnObama, X).
rel(parents, maliaAnnObama, X).
rel(sibling, maliaAnnObama, X). % note: by our definition she's her own sib
rel(atLeastHalfSibling, maliaAnnObama, X).
rel(olderSibling, sashaObama, X).
rel(olderSibling, maliaAnnObama, X).
rel(atLeastHalfSibling, barackObama, X).
rel(sibling, barackObama, X).
rel(compose(parent, firstName), sashaObama, X).
% note that i; parent; parent parent includes untoward pairings.
rel(product([i, parent, compose(parent, parent)]), sashaObama, X).
% to solve the problem, (i; parent) (first; second; second parent)
rel(compose(product([i, parent]),
product([first, second, compose(second, parent)])),
sashaObama, X).
As I said earlier, though, this only handles finite relations
correctly; if you start messing around with sum
and =<
you’re
likely to come up with queries that are perfectly legitimate but that
this naïve implementation cannot handle.
I’ve left out union, transitive closure, and any form of negation, whether infinite negation or constructive negation by means of subtraction. Also, it seems likely that aggregation operations (e.g. average, max, min) would add a lot of power, and they would be simple to add.
Division is missing but I’m not sure what it’s good for.
Suppose we decompose the visits
and square
relations into nine
binary relations: visits_page
, visits_column
, visits_row
,
visits_route
, square_page
, square_column
, square_row
,
square_x
, and square_y
. These are functions from some abstract
set of records to concrete values like "16"
, "c"
, and "41"
.
However, we don’t need to restrict ourselves to functions; we can go
beyond SQL (and, in a way, outside of first normal form) and use
many-to-many relations directly. So instead of dealing with the
distinct visits
and square
tables, we will write our solution in
terms of x
, y
, and route
relations from (page, column, row)
triples to coordinates and route numbers. The route
relation is
many-to-many — that is, it may map the same location to any number of
route numbers. The others are still functions.
The first thing to do is to define these three relations in terms of the original ones. This step would be unnecessary with a database design constructed with binary relations in mind, but it’s useful to see that this conversion step can be accomplished very easily inside this algebra:
squarekey = ~(square_page; square_column; square_row).
x = squarekey square_x.
y = squarekey square_y.
route = ~(visits_page; visits_column; visits_row) visits_route.
So squarekey
is a relation from (page, column, row) triples to
“rows” of the square
relation; composing it with square_x
and
square_y
gives us relations from those triples to X and Y
coordinates.
route
is analogous for routes.
Now let’s define a “numerically-near” relation near
:
succ = (i; 1) sum.
near = succ ~=<, ~succ =<.
If you expand that last one out into Prolog:
rel(near, A, B) :- rel(succ, A, Z), B =< Z, rel(succ, X, A), X =< B.
In other words, B near A iff B ≤ A+1 ∧ A-1 ≤ B.
That’s enough to write neighbor
:
neighbor = x near ~x, y near ~y.
Now we would like to write route_between
, but I will call it connected_by
:
connected_by = first route, second route.
Now we can abstract the general pattern of the query into a
connected_to_neighbor_by
:
connected_to_neighbor_by = (first; second neighbor) connected_by.
So the whole program is:
squarekey = ~(square_page; square_column; square_row).
x = squarekey square_x.
y = squarekey square_y.
route = ~(visits_page; visits_column; visits_row) visits_route.
succ = (i; 1) sum.
near = succ ~=<, ~succ =<.
neighbor = x near ~x, y near ~y.
connected_by = first route, second route.
connected_to_neighbor_by = (first; second neighbor) connected_by.
It’s slightly shorter than the Prolog version. If we assume that the original data is in a suitable form of binary relations, instead of requiring us to restructure it into binary relations, we get:
succ = (i; 1) sum.
near = succ ~=<, ~succ =<.
neighbor = x near ~x, y near ~y.
connected_by = first route, second route.
connected_to_neighbor_by = (first; second neighbor) connected_by.
which is much shorter than the Prolog version.
I think it is probably better not to use positional arguments; not
only do they complicate the base language with first
, second
(third
, etc.) relations, those relations are infinite, and queries
using them tend to be a little opaque. The cost is that the
relational product operation needs names to attach to the ranges of
its operand relations. With this change, the program looks like this:
squarekey = ~{page: square_page; column: square_column; row: square_row}.
x = squarekey square_x.
y = squarekey square_y.
route = ~{page: visits_page; column: visits_column; row: visits_row}
visits_route.
succ = {addend1: i; addend2: 1} sum.
near = succ ~=<, ~succ =<.
neighbor = x near ~x, y near ~y.
connected_by = origin route, destination route.
connected_to_neighbor_by = {origin: origin;
destination: destination neighbor} connected_by.
My original query expressed in the relational algebra looks like this, more or less:
route_between ← σ_(start.route = end.route)(ρ_start(visits) × ρ_end(visits))
squarepairs ← ρ_a(square) × ρ_b(square)
neighbor ← σ_(a.x-1 ≤ b.x ≤ a.x+1 ∧ a.y-1 ≤ b.y ≤ a.y+1)(squarepairs)
rtn ← σ_(end.page = a.page ∧ end.row = a.row ∧ end.column = a.column)( \
route_between × neighbor)
out ← σ_(start.page = '9' ∧ start.column = 'b' ∧ start.row = '4' ∧
b.page = '16' ∧ b.column = 'c' ∧ b.row = '4')(rtn)
Π_route(out)
I think that in this form, this algebra (if that’s what it is) on binary relations is equivalently powerful to the select/project/cartesian-product SPC relational algebra, and substantially briefer. (There’s no way to do unions, so to equal the SPCU algebra you need an additional operation.)
“Project” is carried out by combining the columns you want using the
relational product operation. For example, if you have some row
objects that have many-to-one relations a
, b
, c
, and d
defined
on them, and you want a corresponding set of tuples with just the a
,
b
, and d
values, you can do this:
a; b; d
“Select” is a little hairy because it takes arbitrary Boolean
expressions; but generally they take the form of equality tests
between columns, or between columns and constants. This is done with
the intersection operator combined with the inversion of some columns.
To select rows where a
= 2 and b
= 3, you apply this relation to
any input, or something:
2 ~a, 3 ~b
This gives you a relation from a source domain of everything to a
target domain consisting of just the rows you want. If you then want
a relation from those rows to, say, column c
, you can filter down
the source domain to just the rows themselves using the i
relation,
and then compose that with your existing c
:
(2 ~a, 3 ~b, i) c
To select rows where a == b
, you also need to use the identity
relation. b ~a
gives you all the pairs of rows where one row’s a
matches the other row’s b
; intersecting that with i
gives you just
the cases where the two rows happen to be the same row:
b ~a, i
These can be intersected to form more complex join conditions:
2 ~a, 3 ~b, b ~c, i
A Cartesian product can be constructed by inverting some constant and composing it with some relation that returns the right set of rows, then joining that with the same constant mapped to some other table:
~37 ~square_page; ~37 ~visits_page
This produces a mapping from the constant 37 to all pairs of rows from
square
and visits
.
With the exception of tuple handling, the composition, intersection,
and inversion operations are trivial to translate into the
relational algebra on a bunch of two-column relations. Here I use s
for the source column and t
for the target column.
a b → Π_(a.s, b.t) σ_(a.t = b.s) a × b
a, b → Π_(a.s, a.t) σ_(a.s = b.s, a.t = b.t) a × b
~a → ρ(s→t, t→s) a
In the absence of binary relations containing tuples with varying numbers of members, you can deal with the tuple handling by making the relations have more than two columns. For example, you can handle a two-relation product this way:
a; b → σ_(a.s = b.s) a × b
Then you have to keep track of which columns represent the "source" and "target" of your new, wider relation, in order to apply the trivial translations above for composition, intersection, and inversion.
Adding set subtraction to either the relational algebra or this algebra of binary relations means that the other one cannot model it until the corresponding operation is added; and similarly for union, although if you allow unions of unlike data, such as tuples of different lengths, you will have trouble with the translation to the relational algebra.
XPath. XQL. UnQL. Tree regular expressions. Django’s notation for traversing foreign-key relationships. My “object-oriented equational rewrite rules” thing. Backus’ FP.
The book An Algebra Of Programming has a whole bunch of stuff about binary relations in it. I should read it!
Claudia Mónica Necco (Argentine D.N.I. Nº 16.051.159) is working on reasoning in terms of binary relations to prove correctness properties of programs. She has published a paper on using them for extended static checking of programs that interact with databases.
“A relational data language with simplified binary relation handling capability”, Kambayashi, Tanaka, and Yajima, VLDB 1977? I don’t have a copy, and it looks like nobody has ever cited it, but it sounds very similar indeed!
RDF, SPARQL, and cwm are based on binary relations. But SPARQL (“emerging as the de-facto RDF query language,” says Wikipedia) seems to be dramatically more verbose than relational query langauges rather than less. cwm is hardly better; in N3 we have:
@forAll :child, :parent. {:child gc:parent :parent. :parent :gender :M} log:implies {:child gc:father :parent} .
rather than:
gc:father = gc:parent, "M" ~gender.
There’s a lovely comparison of RDF query languages from 2004, covering RQL, SeRQL, TRIPLE, RDQL, N3, and Versa (but not SPARQL!). It uses as its example query a query that this calculus of binary relations cannot express directly, because it asks if the RDF graph contains an edge of any type whatsoever to another node.
“Binary Relations Approach to building Object Database Model”, Andrew Girow, in Object Currents Vol. 1, Issue 11, SIGS Publications, NY, 1996? He doesn’t seem to have taken the idea very far.
“Implementation of a Graph Oriented Query Language: IUGQL” is Indiana University Tech Report 376, by Sarathy and Van Gucht, 1993. They took a somewhat similar approach, but their query language appears to suck (the first example is on p.27).
SnikiSniki is a simple end-user-oriented database built out of RDF-like triples.
From XQueryX, one of the XQuery Use Cases.
Sample XQuery query:
<bib>
{
for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
where $b/publisher = "Addison-Wesley" and $b/@year > 1991
return
<book year="{ $b/@year }">
{ $b/title }
</book>
}
</bib>
(I’m omitting the 150-line monstrosity that is the XQueryX version.)
I think my version would look something like this:
books = "http://bstore1.example.com/bib.xml" GET <bib> <book>.
wanted = (books, "Addison-Wesley" ~$ ~<publisher>, 1991 < ~# ~@year).
{<bib>: {<book>: wanted {@year: @year; $: <title>}}}
This makes the following assumptions about the already-defined names:
<foo>
from parent elements to child elements whose tag is that
tag “foo”.
$
is a relation from XML elements to their textual contents.
#
is a relation from strings to numbers parsed from them.
There is one crucial difference from the XQuery results: the results are set-oriented, rather than a sequential thing like an XML document. The results are here represented using the same relations as in the input, and so could be serialized as XML, but they are more general than any particular XML serialization of their contents.
And there are more use cases where that came from.
In 1977, Yahiko KOMBAYASHI, Katsumi TANAKA, and Shuzo YAJIMA published a paper in VLDB entitled, “A relational data language with simplified binary relation handling capability”. They describe a query language they have been developing that has special features for handling binary relations, including composition (“concatenation”), transitive closure, and intersection (“boolean operators”), but not including converse and relational product.
Unfortunately, the query language they present owes a lot to SQL (called, at the time, SEQUEL); it is not an expression language, and its data model is very much the standard N-ary relational data model.
Here is the first sample RLB query from their paper:
E1 Find all titles(TI) of documents written by
author(AU)’A’.
GET DOCUMENT.TI
WHERE DOCUMENT.DID# = AUTHOR.DID#
AND AUTHOR.AU = ’A’.
This is essentially the same level of verbosity as SQL. The AUTHOR
relation is somewhat misleadingly named; it is essentially a join
table created to support the many-to-many relationship between
documents and authors, which is why it has a DID#
column. (However,
in Kombayashi’s schema, there is no actual table of authors to join
with.)
In the language presented in the present note, a literal translation of the above produces:
'A' ~au did# ~did# ti.
However, in the world of binary relations, there’s no need for a
separate join table with au
and did#
attributes; the “author”
attribute of a document can simply be a many-to-many relation
directly. This simplifies the query to:
'A' ~author title.
(I have taken the liberty of using longer attribute names.)
Here's a definition of a new binary relation from the same paper:
E3 Define a binary relation between document IDs,
each of which cites at least one document
commonly.
RANGE OF X,Y ARE REFERENCE
DEFINE BINARY SIBLING
DID1# = X.DID#, DID2# = Y.DID#
WHERE X.DIDF# = Y.DIDF#
CONSTRAINT SEMIEQ
This defines a new binary relation called SIBLING
. In the binary
relation language defined in the present note, this can be written
more briefly:
sibling = ~did# didf# ~didf# did#.
One procedural reading of this is as follows: given a document ID
number, first find the document it is the did#
of; then find the
didf#
s of that document (which I suppose are the IDs of the
documents it cites); then find all the documents that have one of those
among their didf#
s; then find the did#
s of those documents.
This lacks the explicit information about the types of X
and Y
in
Kombayashi’s query; in a statically typed implementation of this
language, you could infer this from the types of the did#
and
didf#
relations (which would have to be compatible), and in fact
would have to do so in the type-checking process; but without some
kind of type annotation, you wouldn’t have a way to verify that the
inferred type was what you expected.
It also lacks the constraint given in that the new relation be a “semi-equivalence” relation, that is to say that it is reflexive and symmetric, but not necessarily transitive. Proving that it is symmetric is pretty simple:
sibling = ~did# didf# ~didf# did#. (definition of sibling)
sibling = ~~(~did# didf# ~didf# did#). (x = ~~x)
sibling = ~(~did# ~~didf# ~didf# ~~did#). (~(x y) = ~y ~x)
sibling = ~(~did# didf# ~didf# did#). (x = ~~x)
sibling = ~sibling. (definition of sibling)
Proving that it is reflexive is somewhat more difficult, since it’s only reflexive over the range of document IDs that actually correspond to documents that reference at least one other document, and our notation doesn’t include a way to express even that one binary relation is a subset of another.
It isn't clear to me whether the constraint and range declarations in Kombayashi are intended for optimization or for error-checking — that is, whether the query processor was expected to assume that they were correct or to verify that they were correct. Perhaps I will ask Dr. Tanaka.
In any case, this relation, between documents rather than document-IDs, is probably more useful:
sibling2 = didf# ~didf#.
I developed a lot of these ideas in discussion with Darius Bacon.
Many thanks to Stefan Ljungstrand (ski on Freenode) for his help and explanations of many matters related to logic, binary relations, category theory, and notation.
Written by Kragen Javier Sitaker.