Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-26 14:51:36
Message-ID: 7882.1556290296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> I was wondering if we should just pass in the pg_class tuple as an "out"
>>> argument, instead of pointers to relfilnode/relfrozenxid/relminmxid.

>> Yeah, possibly. The whole business with xids is perhaps heapam specific,
>> so decoupling this function's signature from them would be good.

> I've left that out in the attached. Currently VACUUM FULL / CLUSTER also
> needs to handle those, and the callback for transactional rewrite
> (table_relation_copy_for_cluster()), also returns those as output
> parameter. I think I can see a way how we could clean up the relevant
> cluster.c code, but until that's done, I don't see much point in a
> different interface (I'll probably write apatch .

OK, we can leave that for later. I suppose there's little hope that
v12's version of the tableam API can be chiseled onto stone tablets yet.

> Questions:
> - Should we move the the CommandCounterIncrement() from
> RelationSetNewRelfilenode() to the callers? That'd allow them to do
> other things to the new relation (e.g. fill it), before making the
> changes visible. Don't think it'd currently help, but it seems like it
> could make code more robust in the future.

No, I don't think so. The intermediate state where the relcache entry
is inconsistent with the on-disk state is not something we want to
be propagating all over the place. As for robustness, I'd be more
worried about somebody forgetting the CCI than about possibly being
able to squeeze additional updates into the same CCI cycle.

> - Should we introduce an assertion into CatalogIndexInsert()'s
> HeapTupleIsHeapOnly() path, that asserts that all the relevant indexes
> aren't ReindexIsProcessingIndex()? Otherwise it seems way too easy to
> re-introduce bugs like this one. Dirty hack for that included.

Good idea, but I think I'd try to keep the code the same in a non-assert
build, that is more like

+#ifndef USE_ASSERT_CHECKING
/* HOT update does not require index inserts */
if (HeapTupleIsHeapOnly(heapTuple))
return;
+#endif

/* required setup here ... */

+#ifdef USE_ASSERT_CHECKING
+ /* HOT update does not require index inserts, but check we could have */
+ if (HeapTupleIsHeapOnly(heapTuple))
+ {
+ /* checking here */
+ return;
+ }
+#endif

> - Wonder if we shouldn't introduce something akin to
> SetReindexProcessing() for table rewrites (e.g. VACUUM FULL), to
> prevent the related error of inserting/updating a catalog table that's
> currently being rewritten.

Not terribly excited about that, but if you are, maybe a follow-on
patch could do that.

> Taking this as a WIP, what do you think?

Seems generally about right. One note is that in reindex_index,
the right fix is to push the intermediate code to above the PG_TRY:
there's no reason to start the TRY block any sooner than the
SetReindexProcessing call.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-26 14:53:35 Re: "Routine Reindexing" docs should be updated to reference REINDEX CONCURRENTLY
Previous Message Rafia Sabih 2019-04-26 14:51:26 Re: Execute INSERT during a parallel operation