Re: Improving REINDEX for system indexes (long)

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving REINDEX for system indexes (long)
Date: 2003-09-27 22:05:13
Message-ID: 200309272205.h8RM5Dp16195@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers pgsql-patches


Tom, would you summarize what REINDEX currently _doesn't_ do?

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been looking at the issues involved in reindexing system tables,
> and I now have what I think is a fairly defensible set of proposals.
>
> We should whenever possible use the same reindexing technique used by
> CLUSTER: assign a new relfilenode number, build the new index in that
> file, and apply an ordinary heap_update operation to update the index's
> pg_class row with the new relfilenode value. This technique is fully
> crash-safe and transaction-safe (meaning it can be rolled back even after
> completion, in case of failure later in the same transaction). However,
> there are several pitfalls for applying it to system indexes:
>
> 1. There is a problem for a shared index, because we have no way to
> update pg_class.relfilenode in all the other databases besides ours.
> I see no good way around this problem.
>
> 2. There is a problem for a nailed-in-cache index, because the relcache
> isn't prepared to cope with relfilenode updates for such indexes.
> However, that is fixable.
>
> 3. There is a problem for an index on pg_class itself: doing heap_update
> on a pg_class row must make new index entries. We have to be careful
> that the new row does appear in the updated index, while not making
> entries in old-and-possibly-broken indexes. This is doable.
>
> 4. There is a problem with updating indexes that might be used for catalog
> fetches executed during the index_build process: if we try to use the
> partially-rebuilt index for such a fetch we will fail. In the case of
> disaster recovery the problem doesn't exist because we'll have
> IsIgnoringSystemIndexes true, but for an ordinary on-line indexing
> operation there's definitely a risk. Presently the code relies on
> "deactivating indexes" to prevent this, but I think it can be done more
> simply, because we only need to suppress access to the target index
> locally in the reindexing backend. (Other backends will be locked out by
> means of an AccessExclusiveLock on the system catalog in question.)
>
> In short then, we can fix things so that the new-relfilenode path can be
> taken in all cases except shared indexes. For shared indexes, we have
> little choice but to truncate and rebuild the index in-place. (There is
> then no need to update its pg_class row at all.) This is of course
> horribly crash-unsafe.
>
> The code presently uses a "deactivated indexes" flag (namely, setting
> pg_class.relhasindex false) to provide some protection against a crash
> midway through an in-place reindex. However, this concept is really quite
> useless for a shared index, because only one database could contain the
> deactivation flag. Accesses from any other database would still try to
> use the broken index.
>
> Based on this analysis, I feel that the "deactivated indexes" mechanism
> is of no further value and should be retired. We should instead simply
> acknowledge that reindexing shared indexes is dangerous. I propose that
> it be allowed only from a standalone backend. Specifically I think that
> REINDEX INDEX and REINDEX TABLE should error out if pointed to a shared
> index or table, unless running in a standalone backend; REINDEX DATABASE
> should skip over the shared indexes unless running in a standalone
> backend. (I'm not convinced that either -O or -P needs to be insisted on
> by REINDEX, btw, but we can discuss that separately.) We'll have to warn
> users not to try to restart the database when reindexing of a shared table
> wasn't successfully completed.
>
> Details to back up the above claims:
>
> Part of the code needed to fix the relcache restriction on nailed indexes
> is already there, but ifdef'd out; that's the part that re-reads the
> index's pg_class row after receiving SI inval for it. There are some
> cases not yet handled though. In the first place, if the nailed index
> being modified is pg_class_oid_index, ScanPgRelation will try to use that
> same index to load the updated row, and will fail because it'll be trying
> to use the old relfilenode. We can easily force a seqscan in that case,
> however. A more subtle problem is that if an SI buffer overflow occurs,
> RelationCacheInvalidate walks through the relation cache in a random
> order. I believe that the correct update order has to be first
> pg_class_oid_index, then the other nailed indexes, and finally all other
> relation cache entries. pg_class_oid_index has to be updated first (with
> the aforementioned seqscan), since it will be used by ScanPgRelation to
> reread the pg_class rows for the other nailed indexes. Only when we are
> sure the nailed indexes are up-to-date can we safely rebuild other
> relcache entries.
>
> Assigning a new relfilenode to indexes of pg_class is not a big deal when
> we don't have an index-corruption problem. We simply have to make the new
> heap_updated row before we start index_build (which indeed the code does
> already); both old and new rows will be indexed in the new index, and all
> is well. However, if we suspect index corruption then it is important not
> to try to make entries into the old indexes, because we might crash trying
> to update a corrupt index. I propose the following behavior:
>
> REINDEX INDEX: no special action; this is not the thing to use
> when index corruption is suspected anyway.
>
> REINDEX TABLE: while operating on pg_class, arrange for the
> relcache's list of known indexes of pg_class to contain only the
> indexes already reindexed. In this way, only those indexes will
> be updated by CatalogUpdateIndexes(), and so no untrustworthy
> indexes will be touched while making new pg_class rows.
>
> REINDEX DATABASE: take care to process pg_class first.
>
> To avoid catalog accesses via an index that's being rebuilt, we can simply
> generalize the SetReindexProcessing() state to include the OID of the
> index currently being rebuilt. Places that presently make checks for
> IsIgnoringSystemIndexes can check this as well and fall back to seqscans
> when the targeted index would have been used.
>
> One other point: IsIgnoringSystemIndexes seems presently to be taken to
> mean not only that we don't *read* the system indexes, but that we don't
> *write* them either. I think this is horribly dangerous; it effectively
> means that the only thing you can safely do in a backend started with -P
> is REINDEX. If you make any other changes to the system catalogs then
> you've re-corrupted the indexes. Also, you don't have any protection
> against duplicate entries getting made, since the unique indexes are
> disabled. It would be a lot safer to define IsIgnoringSystemIndexes as
> preventing the system indexes from being used for lookups, while still
> updating the indexes on any catalog modification. This will not affect
> the safety of REINDEX and will make other operations much less prone to
> pilot error.
>
> Comments?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2003-09-27 22:10:01 pgsql-server/src/bin/pg_dump pg_dump.c
Previous Message Hiroshi Inoue 2003-09-27 22:04:58 Re: pgsql-server/src/backend catalog/index.c comma ...

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Inoue 2003-09-27 22:19:22 Re: 2-phase commit
Previous Message Hiroshi Inoue 2003-09-27 22:04:58 Re: pgsql-server/src/backend catalog/index.c comma ...

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2003-09-27 22:05:52 Re: bug fix: TupleDescGetAttInMetadata/BuildTupleFromCStrings
Previous Message Hiroshi Inoue 2003-09-27 22:04:58 Re: pgsql-server/src/backend catalog/index.c comma ...