Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-25 21:56:24
Message-ID: 20190425215624.t724lzphoinzgnrb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > My point was that given the current coding the code in
> > ATExecSetTableSpace() would make changes to the *old* relfilenode, after
> > having already copied the contents to the new relfilenode. Which means
> > that if ATExecSetTableSpace() is ever used on pg_class or one of it's
> > indexes, it'd just loose those changes, afaict.
>
> Hmm.

I think there's no a live bug in because we a) require
allow_system_table_mods to modify system tables, and then b) have
another check

/*
* We cannot support moving mapped relations into different tablespaces.
* (In particular this eliminates all shared catalogs.)
*/
if (RelationIsMapped(rel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot move system relation \"%s\"",
RelationGetRelationName(rel))));

that triggers even when allow_system_table_mods is off.

> There's another reason why we'd like the relcache contents to only change
> at CCI, which is that if we get a relcache invalidation somewhere before
> we get to the CCI, relcache.c would proceed to reload it based on the
> *current* catalog contents (ie, pre-update, thanks to the magic of MVCC),
> so that the entry would revert back to its previous state until we did get
> to CCI. I wonder whether there's any demonstrable bug there. Though
> you'd think the CLOBBER_CACHE_ALWAYS animals would've found it if so.

I think we basically assume that there's nothing triggering relcache
invals here inbetween updating the relcache entry, and doing the actual
catalog modification. That looks like it's currently somewhat OK in the
places we've talked about so far.

This made me look at cluster.c and damn, I'd forgotten about that.
Look at the following code in copy_table_data():

/*
* When doing swap by content, any toast pointers written into NewHeap
* must use the old toast table's OID, because that's where the toast
* data will eventually be found. Set this up by setting rd_toastoid.
* This also tells toast_save_datum() to preserve the toast value
* OIDs, which we want so as not to invalidate toast pointers in
* system catalog caches, and to avoid making multiple copies of a
* single toast value.
*
* Note that we must hold NewHeap open until we are done writing data,
* since the relcache will not guarantee to remember this setting once
* the relation is closed. Also, this technique depends on the fact
* that no one will try to read from the NewHeap until after we've
* finished writing it and swapping the rels --- otherwise they could
* follow the toast pointers to the wrong place. (It would actually
* work for values copied over from the old toast table, but not for
* any values that we toast which were previously not toasted.)
*/
NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
}
else
*pSwapToastByContent = false;

which then goes on to do things like a full blown sort or index
scan. Sure, that's for the old relation, but that's so ugly. It works
because RelationClearRelation() copies over rd_toastoid :/.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-04-25 22:43:15 Re: Pluggable Storage - Andres's take
Previous Message Tomas Vondra 2019-04-25 21:49:09 Re: TRACE_SORT defined by default