Proposal: Adjacent B-Tree index

From: Dilshod Urazov <urazofficial(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proposal: Adjacent B-Tree index
Date: 2024-02-19 05:50:18
Message-ID: CAHc0=ph-7ma=1x-mOc7B1jHkoU_8+is-1zY_ejFz-t-OWiXJUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

- Motivation

A regular B-tree index provides efficient mapping of key values to tuples
within a table. However, if you have two tables connected in some way, a
regular B-tree index may not be efficient enough. In this case, you would
need to create an index for each table. The purpose will become clearer if
we consider a simple example which is the main use-case as I see it.

- Example

We need to store a graph. So we create a table for nodes

CREATE TABLE Nodes (
id SERIAL PRIMARY KEY,
label VARCHAR(255)
);

and a table for edges

CREATE TABLE Edges (
label VARCHAR(255),
source INTEGER REFERENCES Nodes(id),
target INTEGER REFERENCES Nodes(id)
);

In order to efficiently traverse a graph we would have an index for
Nodes.id which created automatically in this case and an index for
Edges.source.

- Tweaked B-Tree

We could optimize cases like the former by modifying PostgreSQL btree index
to allow it to index 2 tables simultaneously.

Semantically it would be UNIQUE index for attribute x of table A and an
index for attribute y in table B. In the non-deduplicated case index tuple
pointing to a tuple in A should be marked by a flag. In the deduplicated
case first TID in an array can be interpreted as a link to A.
During the vacuum of A an index tuple pointing to a dead tuple in A should
be cleaned as well as all index tuples for the same key. We can reach this
by clearing all index tuples after the dead one until we come to index
tuple marked by a flag with different key. Or we can enforce deduplication
in such index.

In the example with a graph such index would provide PRIMARY KEY for Nodes
and the index for Edges.Source. The query

SELECT * FROM Nodes WHERE id = X;

will use this index and take into account only a TID in Nodes (so this
would be marked index tuple or first TID in a posting list). The query

SELECT * FROM Edges WHERE source = X;

conversely will ignore links to Nodes.

-- Syntax

I believe that
CREATE TABLE Nodes (
id SERIAL PRIMARY KEY ADJACENT,
label VARCHAR(255)
);
CREATE TABLE Edges (
label VARCHAR(255),
source INTEGER REFERENCES ADJACENT Nodes(id),
target INTEGER REFERENCES Nodes(id)
);

would suffice for this new semantics.
--
Dilshod Urazov

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-02-19 05:56:44 Re: About a recently-added message
Previous Message Kyotaro Horiguchi 2024-02-19 05:47:58 Re: Do away with zero-padding assumption before WALRead()