neon: a functional database, git for structured data

From: Amirouche Boubekki <amirouche(at)hypermove(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: neon: a functional database, git for structured data
Date: 2018-02-25 17:14:30
Message-ID: 49ebfb078a1b415b9ebcc580a185e7b5@hypermove.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I think, I found a way to create a performant versioned triple store.

Neon achieves versioning *almost* without copy of a set of tuples
that looks like:

(graph, subject, predicate, object)

I described the implementation in GNU Guile using wiredtiger
in the attached (poorly rendered) pdf document.

The project is AGPLv3. for the time being, the repository
is private, send me your github handle if you want to lurk
around.

Basically, the triples are attached to unsigned 64 bits integers
called 'transaction' that draws a direct-acyclic-graph named
'branch history'. That is, neon stores tuples that looks like
the following:

(graph, subject, predicate, object, transaction)

transaction are mapped in another table. That other table
stores the following tuples

(transaction) → (parent1, parent2, sha-256)

This builds DAG where vertex can have at most two
incoming edges but as many outgoings edges. outgoings
edges represent branches; and vertices have in general
a single incoming edge. Vertices with two incomings edges
are merge transactions.

Given a 'transaction' it's possible to know the added
and deleted triples.

Based on the 'branch history', a mapping is computed
associating 'transaction' to another unsigned 64 bits integer
that denotes its 'history significance'.

The order procedure use a topological sort that preserves
'history significance'.

To compute the value of given (graph, subject, predicate)
I use a code equivalent to the following:

SELECT graph, subject, predicate, object, HMAX(trasaction, 'master')
FROM store
WHERE graph='wikipedia', subject='Database', predicate='See Also'
GROUP BY graph, subject, predicate, object

Where HMAX poke at the 'branch history' mapping to compute
the latest transaction.

Special transactions called 'merge' introduce the necessary
adds and deletes that reconcile history between two branch.
The tuples introduced by a 'merge' transaction will have a
bigger 'history significance' shadowing the conflicting tuples.
There is a bit of copying in case of updates and deletes.

The above mapping is implemented with an immutable datastructure
called simply 'history' which can be shared between readers and
writers.

Because it would be intractable to compute the history
otherwise, it's not possible for two transactions to
have the same parent while being in the same branch. Otherwise
anonymous branches are forbidden (unlike mercurial).

There is write lock on a branch history that serialize write
making the database 'single writer on a branch'. That said,
it's possible to create a branches concurrently.

A requested feature is to have a PostgreSQL backend.

Q: Is there a way to speed up the computation of HMAX? Maybe via
a specific index on transaction(parent1, parent2)?

Q: Does an index (or primary key?) speed up range queries over
multiple columns?

Feedback welcome!

--
Amirouche ~ amz3 ~ http://www.hyperdev.fr

Attachment Content-Type Size
neon-v0.0.0.pdf application/pdf 120.5 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2018-02-25 17:22:08 Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Previous Message Magnus Hagander 2018-02-25 16:17:36 Re: Online enabling of checksums