From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Cc: | Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel CREATE INDEX for GIN indexes |
Date: | 2025-01-07 11:59:03 |
Message-ID: | 71b63260-d201-4a0c-94a7-214dcfce5099@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1/6/25 20:13, Matthias van de Meent wrote:
> ...
>>
>> Thanks. Attached is a rebased patch series fixing those issues, and one
>> issue I found in an AssertCheckGinBuffer, which was calling the other
>> assert (AssertCheckItemPointers) even for empty buffers. I think this
>> part might need some more work, so that it's clear what the various
>> asserts assume (or rather to allow just calling AssertCheckGinBuffer
>> everywhere, with some flags).
>
> Thanks for the rebase.
>
>> 0001
>
> In gininsert.c, I think I'd prefer GinBuildShared over GinShared.
> While current GIN infrastructure doesn't do parallel index scans (and
> I can't think of an easy way to parallelize it) I think this it's
> better to make clear that this isn't related to index scan.
>
Agreed, renamed to GinBuildShared.
>> + * mutex protects all fields before heapdesc.
>
> This comment is still inaccurate.
>
Hmm, yeah. But this comment originates from btree, so maybe it's wrong
there (and in BRIN too)? I believe it refers to the descriptors stored
after the struct, i.e. it means "all fields after the mutex".
>> + /* FIXME likely duplicate with indtuples */
>
> I think this doesn't have to be duplicate, as we can distinguish
> between number of heap tuples and the number of GIN (key, TID) pairs
> loaded. This distinction doesn't really exist anywhere else, though,
> so to expose this to users we may need changes in
> pg_stat_progress_create_index.
>
> While I haven't checked if that distinction is being made in the code,
> I think it would be a useful distinction to have.
>
I haven't done anything about this, but I'm not sure adding the number
of GIN tuples to pg_stat_progress_create_index would be very useful. We
don't know the total number of entries, so it can't show the progress.
>> GinBufferInit
>
> This seems to depend on the btree operator classes to get sortsupport
> functions, bypassing the GIN compare support function (support
> function 1) and adding a dependency on the btree opclasses for
> indexable types. This can cause "bad" ordering, or failure to build
> the index when the parallel path is chosen and no default btree
> opclass is defined for the type. I think it'd be better if we allowed
> users to specify which sortsupport function to use, or at least use
> the correct compare function when it's defined on the attribute's
> operator class.
>
Good point! I fixed this by copying the logic from initGinState.
>> include/access/gin_tuple.h
>> + OffsetNumber attrnum; /* attnum of index key */
>
> I think this would best be AttrNumber-typed? Looks like I didn't
> notice or fix that in 0009.
>
You're probably right, but I see the GIN code uses OffsetNumber for
attrnum in a number of places. I wonder why is that. I don't think it
can be harmful, because we can't have GIN on system columns, right?
>> My plan is to eventually commit the first couple patches, possibly up
>> 0007 or even 0009.
>
> Sounds good. I'll see if I have some time to do some cleanup on my
> patches (0008 and 0009), as they need some better polish on the
> comments and commit messages.
>
Thanks!
regards
--
Tomas Vondra
Attachment | Content-Type | Size |
---|---|---|
v20250107-0001-Allow-parallel-create-for-GIN-indexes.patch | text/x-patch | 62.4 KB |
v20250107-0002-Use-mergesort-in-the-leader-process.patch | text/x-patch | 12.6 KB |
v20250107-0003-Remove-the-explicit-pg_qsort-in-workers.patch | text/x-patch | 10.7 KB |
v20250107-0004-Compress-TID-lists-before-writing-tuples-t.patch | text/x-patch | 7.9 KB |
v20250107-0005-Collect-and-print-compression-stats.patch | text/x-patch | 5.1 KB |
v20250107-0006-Enforce-memory-limit-when-combining-tuples.patch | text/x-patch | 14.0 KB |
v20250107-0007-Detect-wrap-around-in-parallel-callback.patch | text/x-patch | 7.5 KB |
v20250107-0008-Use-a-single-GIN-tuplesort.patch | text/x-patch | 33.6 KB |
v20250107-0009-Reduce-the-size-of-GinTuple-by-12-bytes.patch | text/x-patch | 5.5 KB |
v20250107-0010-WIP-parallel-inserts-into-GIN-index.patch | text/x-patch | 18.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhijie Hou (Fujitsu) | 2025-01-07 12:33:56 | RE: Conflict detection for update_deleted in logical replication |
Previous Message | Matthias van de Meent | 2025-01-07 11:56:03 | Re: Further _bt_first simplifications for parallel index scans |