| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Reduce build times of pg_trgm GIN indexes |
| Date: | 2026-03-02 12:17:56 |
| Message-ID: | a90ebbbd-0d77-49c7-b222-3dbffa4e3b14@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 23.01.2026 11:18, David Geier wrote:
> Hi Matthias,
>
> On 21.01.2026 21:50, Matthias van de Meent wrote:
>> On Wed, 21 Jan 2026 at 16:45, David Geier <geidav(dot)pg(at)gmail(dot)com> wrote:
>>>
>>> How do we usually go about such backwards-compatibility breaking
>>> changes?
>>
>> When it concerns a bug, we mention the change in the release notes
>> with a warning to reindex affected indexes to be sure no known
>> corruption remains. See e.g. the final entry in the PG18 release
>> notes' migration section here:
>> https://www.postgresql.org/docs/18/release-18.html#RELEASE-18-MIGRATION.
>>
>>> Could we have pg_upgrade reindex all GIN indexes? Would that be
>>> acceptable?
>>
>> No. We'd handle this like any other collation/opclass fixes; we ask
>> users to reindex their indexes in their own time after they've
>> upgraded their cluster. Note that in this case it concerns an issue
>> with just one GIN opclass, not all GIN indexes; so even if we were to
>> address this in pg_upgrade it wouldn't be a correct choice to reindex
>> every GIN index, as only a subset of those would be affected by this
>> issue.
>>
>> Generally speaking, pg_upgrade doesn't concern itself with the
>> validity of the data structures that are described by the catalogs
>> that it upgrades, it only concerns itself with that it correctly
>> transcribes the catalogs from one version to another, and that the
>> data files of the old cluster are transfered correctly without
>> changes.
>
> Thanks for the clarifications and the link to the release notes. That's
> very helpful. Then I know how to move on and will update the patch
> accordingly.
Attached are the patches rebased on latest master.
I've removed the ASCII fast-path patch 0006 as it turned out to be more
complicated to make work than expected.
I kept the radix sort patch because it gives a decent speedup but I
would like to focus for now on getting patches 0001 - 0004 merged.
They're all simple and, the way I see it, uncontroversial.
I remeasured the savings of 0001 - 0004, which comes on top of the
already committed patch that inlined the comparison function, which gave
another ~5%:
Data set | Patched (ms) | Master (ms) | Speedup
--------------------|--------------|--------------|----------
movies(plot) | 8,058 | 10,311 | 1.27x
lineitem(l_comment) | 223,233 | 256,986 | 1.19x
I've also registered the change at the commit fest, see
https://commitfest.postgresql.org/patch/6418/.
--
David Geier
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0005-Optimize-generate_trgm-with-radix-sort.patch | text/x-patch | 2.9 KB |
| v4-0004-Faster-qunique-comparator-in-generate_trgm.patch | text/x-patch | 2.0 KB |
| v4-0003-Make-btint4cmp-branchless.patch | text/x-patch | 1.0 KB |
| v4-0002-Optimize-generate_trgm-with-sort_template.h.patch | text/x-patch | 2.6 KB |
| v4-0001-Optimize-sort-and-deduplication-in-ginExtractEntr.patch | text/x-patch | 7.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-03-02 12:34:41 | Re: Eliminating SPI / SQL from some RI triggers - take 3 |
| Previous Message | Oleg Tselebrovskiy | 2026-03-02 12:17:13 | psql's 001_basic.pl test could fail on very slow machines |