Re: Reduce build times of pg_trgm GIN indexes

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce build times of pg_trgm GIN indexes
Date: 2026-03-02 12:17:56
Message-ID: a90ebbbd-0d77-49c7-b222-3dbffa4e3b14@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.01.2026 11:18, David Geier wrote:
> Hi Matthias,
>
> On 21.01.2026 21:50, Matthias van de Meent wrote:
>> On Wed, 21 Jan 2026 at 16:45, David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>>>
>>> How do we usually go about such backwards-compatibility breaking
>>> changes?
>>
>> When it concerns a bug, we mention the change in the release notes
>> with a warning to reindex affected indexes to be sure no known
>> corruption remains. See e.g. the final entry in the PG18 release
>> notes' migration section here:
>> https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MIGRATION.
>>
>>> Could we have pg_upgrade reindex all GIN indexes? Would that be
>>> acceptable?
>>
>> No. We'd handle this like any other collation/opclass fixes; we ask
>> users to reindex their indexes in their own time after they've
>> upgraded their cluster. Note that in this case it concerns an issue
>> with just one GIN opclass, not all GIN indexes; so even if we were to
>> address this in pg_upgrade it wouldn't be a correct choice to reindex
>> every GIN index, as only a subset of those would be affected by this
>> issue.
>>
>> Generally speaking, pg_upgrade doesn't concern itself with the
>> validity of the data structures that are described by the catalogs
>> that it upgrades, it only concerns itself with that it correctly
>> transcribes the catalogs from one version to another, and that the
>> data files of the old cluster are transfered correctly without
>> changes.
>
> Thanks for the clarifications and the link to the release notes. That's
> very helpful. Then I know how to move on and will update the patch
> accordingly.

Attached are the patches rebased on latest master.

I've removed the ASCII fast-path patch 0006 as it turned out to be more
complicated to make work than expected.

I kept the radix sort patch because it gives a decent speedup but I
would like to focus for now on getting patches 0001 - 0004 merged.
They're all simple and, the way I see it, uncontroversial.

I remeasured the savings of 0001 - 0004, which comes on top of the
already committed patch that inlined the comparison function, which gave
another ~5%:

Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 8,058 | 10,311 | 1.27x
lineitem(l_comment) | 223,233 | 256,986 | 1.19x

I've also registered the change at the commit fest, see
https://commitfest.postgresql.org/patch/6418/.

--
David Geier

Attachment Content-Type Size
v4-0005-Optimize-generate_trgm-with-radix-sort.patch text/x-patch 2.9 KB
v4-0004-Faster-qunique-comparator-in-generate_trgm.patch text/x-patch 2.0 KB
v4-0003-Make-btint4cmp-branchless.patch text/x-patch 1.0 KB
v4-0002-Optimize-generate_trgm-with-sort_template.h.patch text/x-patch 2.6 KB
v4-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch text/x-patch 7.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2026-03-02 12:34:41 Re: Eliminating SPI / SQL from some RI triggers - take 3
Previous Message Oleg Tselebrovskiy 2026-03-02 12:17:13 psql's 001_basic.pl test could fail on very slow machines