Re: Improving REINDEX for system indexes (long)

From: "Hiroshi Inoue" <inoue(at)tpf(dot)co(dot)jp>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Improving REINDEX for system indexes (long)
Date: 2003-09-21 19:56:35
Message-ID: 000d01c3807a$763228b0$3d283ddb@PbgX
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers pgsql-patches

First it should have been discussed before your commitment or at least
it should be discussed after reversing your change.

I require you to explain me why you committed the change
with no discussion and little investigation.

I also noticed that your change for catalog/index.c

Revision 1.200 / (download) - annotate - [select for diffs] , Mon Sep 23
00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl
Changes since 1.199: +124 -161 lines
Diff to previous 1.199
Get rid of bogus use of heap_mark4update in reindex operations (cf.
recent bug report). Fix processing of nailed-in-cache indexes;
it appears that REINDEX DATABASE has been broken for months :-(.

removed an #ifndef ENABLE_REINDEX_NAILED_RELATIONS
trial stuff. I'm very surprised to see it now.

regards,
Hiroshi Inoue

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> I've been looking at the issues involved in reindexing system tables,
> and I now have what I think is a fairly defensible set of proposals.
>
> We should whenever possible use the same reindexing technique used by
> CLUSTER: assign a new relfilenode number, build the new index in that
> file, and apply an ordinary heap_update operation to update
> the index's
> pg_class row with the new relfilenode value. This technique is fully
> crash-safe and transaction-safe (meaning it can be rolled
> back even after
> completion, in case of failure later in the same
> transaction). However,
> there are several pitfalls for applying it to system indexes:
>
> 1. There is a problem for a shared index, because we have no way to
> update pg_class.relfilenode in all the other databases besides ours.
> I see no good way around this problem.
>
> 2. There is a problem for a nailed-in-cache index, because
> the relcache
> isn't prepared to cope with relfilenode updates for such indexes.
> However, that is fixable.
>
> 3. There is a problem for an index on pg_class itself: doing
> heap_update
> on a pg_class row must make new index entries. We have to be careful
> that the new row does appear in the updated index, while not making
> entries in old-and-possibly-broken indexes. This is doable.
>
> 4. There is a problem with updating indexes that might be
> used for catalog
> fetches executed during the index_build process: if we try to use the
> partially-rebuilt index for such a fetch we will fail. In the case of
> disaster recovery the problem doesn't exist because we'll have
> IsIgnoringSystemIndexes true, but for an ordinary on-line indexing
> operation there's definitely a risk. Presently the code relies on
> "deactivating indexes" to prevent this, but I think it can be
> done more
> simply, because we only need to suppress access to the target index
> locally in the reindexing backend. (Other backends will be
> locked out by
> means of an AccessExclusiveLock on the system catalog in question.)
>
> In short then, we can fix things so that the new-relfilenode
> path can be
> taken in all cases except shared indexes. For shared indexes, we have
> little choice but to truncate and rebuild the index in-place.
> (There is
> then no need to update its pg_class row at all.) This is of course
> horribly crash-unsafe.
>
> The code presently uses a "deactivated indexes" flag (namely, setting
> pg_class.relhasindex false) to provide some protection against a crash
> midway through an in-place reindex. However, this concept is
> really quite
> useless for a shared index, because only one database could
> contain the
> deactivation flag. Accesses from any other database would
> still try to
> use the broken index.
>
> Based on this analysis, I feel that the "deactivated indexes"
> mechanism
> is of no further value and should be retired. We should
> instead simply
> acknowledge that reindexing shared indexes is dangerous. I
> propose that
> it be allowed only from a standalone backend. Specifically I
> think that
> REINDEX INDEX and REINDEX TABLE should error out if pointed
> to a shared
> index or table, unless running in a standalone backend;
> REINDEX DATABASE
> should skip over the shared indexes unless running in a standalone
> backend. (I'm not convinced that either -O or -P needs to be
> insisted on
> by REINDEX, btw, but we can discuss that separately.) We'll
> have to warn
> users not to try to restart the database when reindexing of a
> shared table
> wasn't successfully completed.
>
> Details to back up the above claims:
>
> Part of the code needed to fix the relcache restriction on
> nailed indexes
> is already there, but ifdef'd out; that's the part that re-reads the
> index's pg_class row after receiving SI inval for it. There are some
> cases not yet handled though. In the first place, if the nailed index
> being modified is pg_class_oid_index, ScanPgRelation will try
> to use that
> same index to load the updated row, and will fail because
> it'll be trying
> to use the old relfilenode. We can easily force a seqscan in
> that case,
> however. A more subtle problem is that if an SI buffer
> overflow occurs,
> RelationCacheInvalidate walks through the relation cache in a random
> order. I believe that the correct update order has to be first
> pg_class_oid_index, then the other nailed indexes, and
> finally all other
> relation cache entries. pg_class_oid_index has to be updated
> first (with
> the aforementioned seqscan), since it will be used by
> ScanPgRelation to
> reread the pg_class rows for the other nailed indexes. Only
> when we are
> sure the nailed indexes are up-to-date can we safely rebuild other
> relcache entries.
>
> Assigning a new relfilenode to indexes of pg_class is not a
> big deal when
> we don't have an index-corruption problem. We simply have to
> make the new
> heap_updated row before we start index_build (which indeed
> the code does
> already); both old and new rows will be indexed in the new
> index, and all
> is well. However, if we suspect index corruption then it is
> important not
> to try to make entries into the old indexes, because we might
> crash trying
> to update a corrupt index. I propose the following behavior:
>
> REINDEX INDEX: no special action; this is not the thing to use
> when index corruption is suspected anyway.
>
> REINDEX TABLE: while operating on pg_class, arrange for the
> relcache's list of known indexes of pg_class to contain only the
> indexes already reindexed. In this way, only those indexes will
> be updated by CatalogUpdateIndexes(), and so no untrustworthy
> indexes will be touched while making new pg_class rows.
>
> REINDEX DATABASE: take care to process pg_class first.
>
> To avoid catalog accesses via an index that's being rebuilt,
> we can simply
> generalize the SetReindexProcessing() state to include the OID of the
> index currently being rebuilt. Places that presently make checks for
> IsIgnoringSystemIndexes can check this as well and fall back
> to seqscans
> when the targeted index would have been used.
>
> One other point: IsIgnoringSystemIndexes seems presently to
> be taken to
> mean not only that we don't *read* the system indexes, but
> that we don't
> *write* them either. I think this is horribly dangerous; it
> effectively
> means that the only thing you can safely do in a backend
> started with -P
> is REINDEX. If you make any other changes to the system catalogs then
> you've re-corrupted the indexes. Also, you don't have any protection
> against duplicate entries getting made, since the unique indexes are
> disabled. It would be a lot safer to define
> IsIgnoringSystemIndexes as
> preventing the system indexes from being used for lookups, while still
> updating the indexes on any catalog modification. This will
> not affect
> the safety of REINDEX and will make other operations much
> less prone to
> pilot error.
>
> Comments?
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2003-09-21 20:04:34 Re: Improving REINDEX for system indexes (long)
Previous Message Tom Lane 2003-09-21 18:55:01 Improving REINDEX for system indexes (long)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-09-21 20:04:34 Re: Improving REINDEX for system indexes (long)
Previous Message Tom Lane 2003-09-21 19:11:46 Re: Can't Build 7.3.4 on OS X

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2003-09-21 20:04:34 Re: Improving REINDEX for system indexes (long)
Previous Message Tom Lane 2003-09-21 18:55:01 Improving REINDEX for system indexes (long)