Re: Reduce build times of pg_trgm GIN indexes

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce build times of pg_trgm GIN indexes
Date: 2026-04-14 14:24:10
Message-ID: 5650bf75-dcb8-446d-8cba-e626eb44594b@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13.04.2026 17:06, David Geier wrote:
>> I squashed 0002 and 0004 into one commit, and did some more refactoring:
>> I created a trigram_qsort() helper function that calls the signed or
>> unsigned variant, so that that logic doesn't need to be duplicated in
>> the callers. For symmetry, I also added a trigram_qunique() helper
>> function which just calls qunique() with the new, faster CMPTRGM_EQ
>> comparator. Pushed these as commit 9f3755ea07.
>
> Thanks for committing these patches.

Attached are the remaining patches (previously 0003 and 0005) rebased on
latest master. Currently, there's no radix sort variant for the unsigned
char case. Do we care about this case or is it fine if that case runs
slower?

The following perf profiles show that trigram_qsort() goes from ~34%
down to ~7% with the radix sort optimization. The optimized run also
includes the btint4cmp() optimization. Without that the result would be
even better.

With that change we could move on and tackle optimizing

1. 41.52% generate_trgm_only() by e.g. using an ASCII fast-patch
2. 32.72% ginInsertBAEntries() by no longer using the RB-tree but
e.g. also the radix sort

master

- heapam_index_build_range_scan

- 99.40% ginBuildCallback

- ginHeapTupleBulkInsert

- 66.55% ginExtractEntries

- 65.29% FunctionCall3Coll

- gin_extract_value_trgm

- 62.80% generate_trgm

+ 34.33% trigram_qsort (inlined)

+ 26.20% generate_trgm_only

+ 2.23% trigram_qunique (inlined)

+ 1.74% detoast_attr

+ 1.19% qsort_arg_entries

+ 32.72% ginInsertBAEntries

patched

- heapam_index_build_range_scan

- 99.42% ginBuildCallback

- 95.95% ginHeapTupleBulkInsert

- 59.11% ginExtractEntries

- 56.93% FunctionCall3Coll

- gin_extract_value_trgm

- 52.19% generate_trgm

+ 41.52% generate_trgm_only

+ 7.14% trigram_qsort (inlined)

+ 3.53% trigram_qunique (inlined)

+ 4.08% detoast_attr

+ 2.13% qsort_arg_entries

+ 36.78% ginInsertBAEntries

--
David Geier

Attachment Content-Type Size
v6-0002-Optimize-generate_trgm-with-radix-sort.patch text/x-patch 2.2 KB
v6-0001-Make-btint4cmp-branchless.patch text/x-patch 1.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-04-14 14:46:56 Re: Bug: COPY FORMAT JSON includes generated columns unlike text/CSV
Previous Message Aleksander Alekseev 2026-04-14 14:04:26 Re: [PATCH] Refactor *_abbrev_convert() functions