| From: | David Geier <geidav(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Reduce build times of pg_trgm GIN indexes |
| Date: | 2026-04-14 14:24:10 |
| Message-ID: | 5650bf75-dcb8-446d-8cba-e626eb44594b@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 13.04.2026 17:06, David Geier wrote:
>> I squashed 0002 and 0004 into one commit, and did some more refactoring:
>> I created a trigram_qsort() helper function that calls the signed or
>> unsigned variant, so that that logic doesn't need to be duplicated in
>> the callers. For symmetry, I also added a trigram_qunique() helper
>> function which just calls qunique() with the new, faster CMPTRGM_EQ
>> comparator. Pushed these as commit 9f3755ea07.
>
> Thanks for committing these patches.
Attached are the remaining patches (previously 0003 and 0005) rebased on
latest master. Currently, there's no radix sort variant for the unsigned
char case. Do we care about this case or is it fine if that case runs
slower?
The following perf profiles show that trigram_qsort() goes from ~34%
down to ~7% with the radix sort optimization. The optimized run also
includes the btint4cmp() optimization. Without that the result would be
even better.
With that change we could move on and tackle optimizing
1. 41.52% generate_trgm_only() by e.g. using an ASCII fast-patch
2. 32.72% ginInsertBAEntries() by no longer using the RB-tree but
e.g. also the radix sort
master
- heapam_index_build_range_scan
- 99.40% ginBuildCallback
- ginHeapTupleBulkInsert
- 66.55% ginExtractEntries
- 65.29% FunctionCall3Coll
- gin_extract_value_trgm
- 62.80% generate_trgm
+ 34.33% trigram_qsort (inlined)
+ 26.20% generate_trgm_only
+ 2.23% trigram_qunique (inlined)
+ 1.74% detoast_attr
+ 1.19% qsort_arg_entries
+ 32.72% ginInsertBAEntries
patched
- heapam_index_build_range_scan
- 99.42% ginBuildCallback
- 95.95% ginHeapTupleBulkInsert
- 59.11% ginExtractEntries
- 56.93% FunctionCall3Coll
- gin_extract_value_trgm
- 52.19% generate_trgm
+ 41.52% generate_trgm_only
+ 7.14% trigram_qsort (inlined)
+ 3.53% trigram_qunique (inlined)
+ 4.08% detoast_attr
+ 2.13% qsort_arg_entries
+ 36.78% ginInsertBAEntries
--
David Geier
| Attachment | Content-Type | Size |
|---|---|---|
| v6-0002-Optimize-generate_trgm-with-radix-sort.patch | text/x-patch | 2.2 KB |
| v6-0001-Make-btint4cmp-branchless.patch | text/x-patch | 1.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-04-14 14:46:56 | Re: Bug: COPY FORMAT JSON includes generated columns unlike text/CSV |
| Previous Message | Aleksander Alekseev | 2026-04-14 14:04:26 | Re: [PATCH] Refactor *_abbrev_convert() functions |