| From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
|---|---|
| To: | Maxime Schoemans <maxime(dot)schoemans(at)enterprisedb(dot)com> |
| Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers mailing list <pgsql-hackers(at)lists(dot)postgresql(dot)org>, me(at)komzpa(dot)net, pramsey(at)cleverelephant(dot)ca |
| Subject: | Re: Multi-Entry Indexing for GiST & SP-GiST |
| Date: | 2026-06-19 08:26:31 |
| Message-ID: | CAEze2WisRoK4SdWtH855e-SVFt7OLg07tR9ggXB3wt2sakc7wA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 5 Jun 2026 at 00:31, Maxime Schoemans
<maxime(dot)schoemans(at)enterprisedb(dot)com> wrote:
>
> Hi Andrey, hi Matthias,
>
> Thank you both for the reviews. A lot of the points overlap, so I'm
> answering both here and grouping by topic instead of by message. v2 is
> attached, with the changes I describe below. The first four patches are
> the proposal. The last two are a separate suggestion for the sorted build,
> which I'd be happy to drop if you'd rather leave it for later.
>
> -- Multi-entry in core vs. a specialized AM
>
> Matthias:
> > I'm not sure this is preferable over a specialized AM (comparable to
> > what GIN is to btree), but cool!
>
> The prototype I presented at PGConf.dev 2024 (MEST) was a separate forked
> AM, so I did start there. The trouble is that multi-entry as it stands adds
> no new on-disk structure. It reuses the existing GiST R-tree (and SP-GiST
> quad-tree) unchanged and just feeds it decomposed entries, so a separate AM
> would duplicate almost all of GiST/SP-GiST with little of its own.
I'm not trying to say that you need to fork the AM; that's not my point.
What I am trying to say is that by adding multi-entry, a lot of edge
cases need to be considered and added to the GIST AM definition
itself. A separate IndexAmRoutine which defines the multi-entry
variant would make more sense to me, even if they were to share most
of their internals and physical format definitions.
> GIN can
> justify its own AM because it adds posting lists on top of a btree.
It (critically) also contains its own internal expressions with a
planning/execution engine, which is much more complicated than much of
what the nbtree code needs to do.
> Multi-entry could grow something comparable one day (merging near-duplicate
> entries, with recheck), but that is a much larger change and nothing here
> needs it.
I'd say it's important to keep disk usage in mind with this type of
change. Do all those users who want this feature want it to use this
much more disk?
> extractValue as an optional support proc reuses all the existing
> machinery, is opt-in per opclass, and costs existing opclasses nothing.
>
> Moving it into core also helps adoption. Heavy GiST users like PostGIS are
> more likely to add an extractValue function to their opclasses than to
> depend on an out-of-tree AM.
Agreed, but that assumes this separate AM would be out-of-tree.
> -- TID deduplication and scan memory
>
> Matthias:
> > Could you help me understand how this deduplication works? Wouldn't
> > this possibly use ~ unbounded memory [...]? [...] we can't use lossy
> > checks for amgettuple() without losing correctness
>
> Andrey:
> > [...] set INDEX_AM_RESERVED_BIT [...] only when extractValue returns
> > nentries > 1, and skip the hash on scan for entries without the bit
>
> For bitmap scans there is no extra memory, the TID bitmap deduplicates on
> its own. The hash is only used for amgettuple (plain and KNN ordered
> scans), and you are right that it cannot go lossy without breaking
> amgettuple, so the GIN trick does not carry over.
>
> Two things keep it bounded. v2 adopts Andrey's reserved-bit idea, so the
> hash only ever holds TIDs that extractValue actually multiplied (the bit is
> set only when nentries > 1), and single-entry values skip it entirely. And
> the planner only chooses amgettuple when it expects few matching rows.
Even in KNN plans when the ordering operator is very expensive? Note
that "few" is a relative term; an index scan will still be selected if
a full tuplesort becomes more expensive than random IOs caused by KNN
index scans.
> For the worst case there are ways to bound it, like Andrey's suggestion of
> spilling to a unique tuplestore once the hash goes over work_mem.
That'd be at the cost of very expensive IOs for ~ every tuple scanned,
so I'm not sure that's such a great idea.
> v2 does
> not do this yet. Matthias, are you thinking along those lines, or did you
> have another idea in mind? And does it need handling before this goes in,
> or is it fine as a follow-up?
I think it should be handled in some way or another by the time this
patch gets committed. Exceeding work_mem isn't exactly a great
solution, and putting TIDs in a hashtable (even a simplehash) wastes
memory.
A nit about memory: If you swap the order of GISTTIDHashEntry's hash
and status fields, the size of the struct drops to 12B from its
current 16B; it's probably worth benchmarking to see if the lack of
power-of-two alignment is worth the increased memory efficiency.
> -- Index-only scans
>
> Matthias:
> > [Index-only scans are disabled on a multi-entry key column.] How is
> > this done? Surely it's not through a planner check of GiST opclasses?
>
> No planner check. It's amcanreturn (gistcanreturn), which returns false
> for a key column that has an extractValue proc, since the stored
> sub-entries don't represent the original datum. INCLUDE columns stay
> returnable.
Hmm, yes. I assume that INCLUDE columns' values are duplicated across
entries, too? That sounds rather storage-expensive, but if the user is
willing to pay that cost without complaints, who am I to judge...
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2026-06-19 08:57:28 | Re: Row pattern recognition |
| Previous Message | Shlok Kyal | 2026-06-19 07:48:34 | Re: [PATCH] Preserve replication origin OIDs in pg_upgrade |