Re: Multi-Entry Indexing for GiST & SP-GiST

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Maxime Schoemans <maxime(dot)schoemans(at)enterprisedb(dot)com>
Cc: pgsql-hackers mailing list <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Subject: Re: Multi-Entry Indexing for GiST & SP-GiST
Date: 2026-05-31 18:27:12
Message-ID: B62366D2-6CAA-4482-97BD-E396925CF197@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 21 May 2026, at 22:34, Maxime Schoemans <maxime(dot)schoemans(at)enterprisedb(dot)com> wrote:
>
> patches attached

Hi Maxime,

I have been reading through the patch set. I will focus on the GiST side
here - I know the SP-GiST internals far less well. So I would rather
discuss the architecture where I can actually be useful.

Skipping dedup for non-duplicated entries
------------------------------------------

On the scan path, once an opclass has extractValue, every leaf entry
goes through the TID hash even when the indexed value produced a single
sub-entry and therefore cannot collide. GiST scans are CPU-bound (we
examine every tuple on the page and run consistent on each), so this
probe lands on the hot path rather than being hidden behind I/O.

Since multi-entry is gated on a new, non-default opclass, no existing
index takes this path, so the leaf format for these opclasses is
effectively new and free to extend. INDEX_AM_RESERVED_BIT (0x2000 in
t_info) is reserved for exactly such stuff and is currently unused anywhere
in the backend. We could set it at insert/build time only when extractValue
returns nentries > 1, and skip the hash on scan for entries without the
bit; the hash then grows only with genuinely multiplied TIDs. I am not
proposing it as a must, just noting the format is new enough to allow it.

One related concern: I am not a big fan of the single-key-column
restriction. Features like this should be orthogonal to the rest of the
AM, and "throws an error on more than one column" tends to calcify into a
permanent limitation rather than a temporary one.

BTW sorting build ignores extract_value. But that's kinda not important at
current stage.

extractValue == new compress
----------------------------

What strikes me in the catalog is that multirange_me_ops drops the
compress support proc (3) and adds extractValue (13), while multirange_ops
is the reverse. So extractValue already supplants compress here: it emits
leaf-typed values directly. Conceptually compress is just extractValue
constrained to nentries == 1, and the SP-GiST side already makes compress
optional when extractValue is present, which points at the same overlap.

Was unifying the two considered, rather than carrying two parallel
support procs? For example a single "produce leaf entries" entry point,
with a 1->1 shim over compress for the existing opclasses. That would
keep the insert/build path single rather than branching on whether
extractValue exists, and it would frame multi-entry as a generalization
of what compress already does rather than a parallel mechanism.

Is this useful to PostGIS?
--------------------------

The motivation that matters most to me is whether the real heavy users of
GiST will adopt this. Multiranges are a fairly narrow audience on their
own; the compelling case is multi-part geometries (MultiPolygon with
holes, routes, regions with exclaves), which is PostGIS territory.

I am adding Darafei and Paul to CC - it would be very helpful to
hear whether PostGIS would actually use extractValue in their GiST
opclasses, and whether the single-column restriction or the per-entry
dedup cost would be a problem in practice for them. If the GIS side is
on board, the feature is clearly worth itю If not, it is worth knowing
that when designing the AM-level machinery.

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lukas Fittl 2026-05-31 19:01:40 Unify parallel worker handling for index builds and instrumentation
Previous Message Zsolt Parragi 2026-05-31 17:26:40 Re: Improve pg_stat_statements scalability