| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | alvherre(at)kurilemu(dot)de, Kirill Reshke <reshkekirill(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Fix REPACK with WITHOUT OVERLAPS replica identity indexes |
| Date: | 2026-05-13 08:35:30 |
| Message-ID: | 69CA7636-0359-42B6-B971-16A036698187@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On May 13, 2026, at 00:48, Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
>
>> On 2026-May-11, Chao Li wrote:
>>
>>>> On May 10, 2026, at 06:38, Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
>>
>>>> I think it would be a good idea to make identity_key_equal() not deform
>>>> all attributes, but instead only up to the last one it needs for the key
>>>> comparisons.
>>>
>>> That’s true. Please see v3.
>>
>> Thanks. I did one further small change, namely to determine these last
>> attnums just once per run rather than once per tuple. Pushed now.
>
> I appreciate that REPACK can handle more cases now! However, I found a problem
> (or at least a question) when rebasing the improvements for the next
> release(s). (It's related to splitting the table scan into multiple block
> ranges and use one snapshot per range, details are not too important here, )
> Assertion failure in the new code made me think if other than B-tree indexes
> should be allowed in the USING INDEX clause of REPACK.
>
> AFAICS, only B-tree indexes (and some special ones that don't appear in the
> core) provide ordering information - see get_relation_info():
>
> /*
> * Fetch the ordering information for the index, if any.
> */
> if (info->relam == BTREE_AM_OID)
> {
> ...
> info->sortopfamily = info->opfamily;
> ...
> }
> else if (amroutine->amcanorder)
> {
> /*
> * Otherwise, identify the corresponding btree opfamilies
> * by trying to map this index's "<" operators into btree.
> * Since "<" uniquely defines the behavior of a sort
> * order, this is a sufficient test.
> ...
> }
> else
> {
> ...
> info->sortopfamily = NULL;
> ...
> }
>
>
> Therefore, index scan shouldn't be possible for GIST index - see
> build_index_paths():
>
> index_is_ordered = (index->sortopfamily != NULL);
>
>
> So I'm not sure if clustering makes sense here. What makes me confused is that
> GIST has IndexAmRoutine.amclusterable=true. As it has amcanorder=false at the
> same time, I suspect it might be just a thinko. However, if we simply set
> amclusterable to false, it can break upgrade to PG 19 for users who already
> "clustered" some table by a GIST index (for mysterious reasons). (BTW, do we
> need the amclusterable field at all?)
>
> REPACK currently rejects explicit sort if non-B-tree index is specified (due
> to lack of ordering information), but it still scans the index rather than
> the heap - see copy_table_data() and heapam_relation_copy_for_cluster().
>
> Does this seem worth fixing now? Or maybe at least worth some comments (unless
> I'm completely wrong)?
After some investigation, I think I see the mismatch:
* get_relation_info(): non-ordered GiST cannot provide sort order. That is expected.
* copy_table_data() only uses plan_cluster_use_sort() for btree. For any other clusterable index, it sets use_sort = false and does a raw index scan.
* The docs say REPACK can re-sort using index scan “if the index is a b-tree” or seqscan+sort, which does not describe what the code actually does for GiST.
I am not sure whether we should change the behavior in PG19. Alvaro may have a better idea about that. But I agree that we can at least clarify the code comment and documentation. The attached patch attempts to do that.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| repack_comment.diff | application/octet-stream | 4.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Zhenwei Shang | 2026-05-13 09:32:09 | Re: Fix SPLIT PARTITION bound-overlap bug and other improvements |
| Previous Message | Zhenwei Shang | 2026-05-13 07:27:24 | Re: [PATCH] Fix column name escaping in postgres_fdw stats import |