| From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
|---|---|
| To: | Maxime Schoemans <maxime(dot)schoemans(at)enterprisedb(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Multi-Entry Indexing for GiST & SP-GiST |
| Date: | 2026-06-01 15:35:03 |
| Message-ID: | CAEze2WjJ+Q8NEy3XnYj5VuRuOWEb9M0iwge_oOPkCaYnCGyHYw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 20 May 2026 at 10:45, Maxime Schoemans
<maxime(dot)schoemans(at)enterprisedb(dot)com> wrote:
>
> Hi hackers,
>
> This patch set adds multi-entry support to the GiST and SP-GiST access
> methods, allowing a single heap tuple to produce multiple index entries --
> similar to how GIN decomposes values via extractValue, but within the
> R-tree (GiST) and quad-tree (SP-GiST) frameworks.
I think the attachments were lost in transmission.
> Motivation
> ----------
>
> The existing GiST multirange opclass compresses a multirange into its
> bounding union range, losing information about gaps. This makes operators
> like @> and <@ imprecise at the index level, requiring many false-positive
> rechecks. Multi-entry GiST instead stores each component range as a
> separate index entry, giving the R-tree much tighter bounds and
> significantly reducing rechecks for multiranges with gaps.
Cool! I'm not sure this is preferable over a specialized AM
(comparable to what GIN is to btree), but cool!
> Other design decisions:
>
[...]
> - Index-only scans are disabled on a multi-entry key column, since
> the stored sub-entries do not represent the original datum.
How is this done? Surely it's not through a planner check of GiST opclasses?
> - For SP-GiST, compress becomes optional when extractValue is present:
> extractValue produces the leaf-typed values directly, and leafType
> may differ from the input type (e.g., anymultirange -> anyrange).
I don't think it's generally safe to remove compression even when
extractValue is present.
> - Since leaf consistent functions see components, a separate opclass
> (multirange_me_ops) is provided alongside the existing multirange_ops.
>
> Patch overview
> --------------
>
> 0001 - GiST AM infrastructure (gist.h, gist_private.h, gist.c,
> gistbuild.c, gistget.c, gistscan.c, gistutil.c, gistvalidate.c): new
> extractValue support function (proc 13), multi-entry insert/build paths,
> TID deduplication during scans using simplehash.
Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory (or however much memory is needed
to store up to 2^48 TIDs)?
GIN gets away with it because it uses a lossy Bitmap structure, and
because it traverses the TID space linearly, but we can't use lossy
checks for amgettuple() without losing correctness, and I don't think
GiST can do the same.
Kind regards,
Matthias van de Meent
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-06-01 15:35:16 | Re: should we have a fast-path planning for OLTP starjoins? |
| Previous Message | Melanie Plageman | 2026-06-01 15:05:28 | Re: autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM) |