Quick Links

Re: Multi-Entry Indexing for GiST & SP-GiST

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Maxime Schoemans <maxime(dot)schoemans(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Multi-Entry Indexing for GiST & SP-GiST
Date:	2026-06-01 15:35:03
Message-ID:	CAEze2WjJ+Q8NEy3XnYj5VuRuOWEb9M0iwge_oOPkCaYnCGyHYw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 20 May 2026 at 10:45, Maxime Schoemans
<maxime(dot)schoemans(at)enterprisedb(dot)com> wrote:
>
> Hi hackers,
>
> This patch set adds multi-entry support to the GiST and SP-GiST access
> methods, allowing a single heap tuple to produce multiple index entries --
> similar to how GIN decomposes values via extractValue, but within the
> R-tree (GiST) and quad-tree (SP-GiST) frameworks.

I think the attachments were lost in transmission.

> Motivation
> ----------
>
> The existing GiST multirange opclass compresses a multirange into its
> bounding union range, losing information about gaps. This makes operators
> like @> and <@ imprecise at the index level, requiring many false-positive
> rechecks. Multi-entry GiST instead stores each component range as a
> separate index entry, giving the R-tree much tighter bounds and
> significantly reducing rechecks for multiranges with gaps.

Cool! I'm not sure this is preferable over a specialized AM
(comparable to what GIN is to btree), but cool!

> Other design decisions:
>
[...]
> - Index-only scans are disabled on a multi-entry key column, since
> the stored sub-entries do not represent the original datum.

How is this done? Surely it's not through a planner check of GiST opclasses?

> - For SP-GiST, compress becomes optional when extractValue is present:
> extractValue produces the leaf-typed values directly, and leafType
> may differ from the input type (e.g., anymultirange -> anyrange).

I don't think it's generally safe to remove compression even when
extractValue is present.

> - Since leaf consistent functions see components, a separate opclass
> (multirange_me_ops) is provided alongside the existing multirange_ops.
>
> Patch overview
> --------------
>
> 0001 - GiST AM infrastructure (gist.h, gist_private.h, gist.c,
> gistbuild.c, gistget.c, gistscan.c, gistutil.c, gistvalidate.c): new
> extractValue support function (proc 13), multi-entry insert/build paths,
> TID deduplication during scans using simplehash.

Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory (or however much memory is needed
to store up to 2^48 TIDs)?

GIN gets away with it because it uses a lossy Bitmap structure, and
because it traverses the TID space linearly, but we can't use lossy
checks for amgettuple() without losing correctness, and I don't think
GiST can do the same.

Kind regards,

Matthias van de Meent

In response to

Multi-Entry Indexing for GiST & SP-GiST at 2026-05-20 17:44:52 from Maxime Schoemans

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2026-06-01 15:35:16	Re: should we have a fast-path planning for OLTP starjoins?
Previous Message	Melanie Plageman	2026-06-01 15:05:28	Re: autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM)