Re: ALTER tbl rewrite loses CLUSTER ON index

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER tbl rewrite loses CLUSTER ON index
Date: 2020-04-02 06:14:21
Message-ID: 20200402061421.GA112468@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2020 at 11:48:37AM +0900, Michael Paquier wrote:
> Anyway, Tom, Alvaro, are you planning to look at what is proposed on
> this thread? I don't want to step on your toes if that's the case and
> it seems to me that the approach taken by the patch is sound, using as
> basic fix the addition of an AT_ClusterOn sub-command to the list of
> commands to execute when rebuilding the table, ensuring that any
> follow-up CLUSTER command will use the correct index.

Hearing nothing, I have been looking at the patches sent upthread, and
did some modifications as per the attached for 0001. The logic looked
fine to me and it is unchanged as you are taking care of normal
indexes as well as constraint indexes. Please note that I have
tweaked some comments, and removed what was on top of
RememberConstraintForRebuilding() as that was just a duplicate.
Regarding the tests, I was annoyed by the fact that the logic was not
manipulating two indexes at the same time on the relation rewritten
with a normal and a constraint index, and cross-checking both at the
same time to see which one is clustered after each rewrite is good for
consistency.

Now, regarding patch 0002, note that you have a problem for this part:
- tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexOid));
- if (!HeapTupleIsValid(tuple)) /* probably can't happen */
- {
- relation_close(OldHeap, AccessExclusiveLock);
- pgstat_progress_end_command();
- return;
- }
- indexForm = (Form_pg_index) GETSTRUCT(tuple);
- if (!indexForm->indisclustered)
+ if (!get_index_isclustered(indexOid))
{
- ReleaseSysCache(tuple);
relation_close(OldHeap, AccessExclusiveLock);
pgstat_progress_end_command();
return;
}
- ReleaseSysCache(tuple);

On an invalid tuple for pg_index, the new code would issue an error,
while the old code would just return. And it seems to me that this
can lead to problems because the parent relation is processed and
locked at the beginning of cluster_rel(), *after* we know the index
OID to work on. The refactoring is fine for the other two areas
though, so I think that there is still value to apply
get_index_isclustered() within mark_index_clustered() and cluster(),
and I would suggest to keep 0002 to that.

Justin, what do you think?
--
Michael

Attachment Content-Type Size
v8-0001-Preserve-clustered-indexes-after-rewrites-of-ALTE.patch text/x-diff 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-02 06:21:50 Re: potential stuck lock in SaveSlotToPath()
Previous Message David Rowley 2020-04-02 06:11:48 Re: Berserk Autovacuum (let's save next Mandrill)