Invalid indexes should not consume update overhead

From: Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Invalid indexes should not consume update overhead
Date: 2016-07-13 11:10:45
Message-ID: 8b57a306-7951-8442-df16-8c581d61b4e6@ato.waw.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

When index is build concurrently and this build fails the index is left
in invalid state. It's basically unusable for anything, but, according
to documentation:

> If a problem arises while scanning the table, such as a deadlock or
> a uniqueness violation in a unique index, the CREATE INDEX command
> will fail but leave behind an "invalid" index. This index will be
> ignored for querying purposes because it might be incomplete; however
> it *will* *still* *consume* *update* *overhead*. The psql \d command
> will report such an index as INVALID

I think this update overhead is actually wasted - there's no way to make
use of it, as the only way to make the index usable again is to reindex
it or drop and recreate.

In the other hand if invalid indexes would have no update overhead then
they may actually be useful. Please consider the following situation
(taken from the real world):

- I have a very large table with a large number of indexes on a server
with large number of CPUs.
- I need to add a new not null column and I need to do this in a fairly
small maintenance window.
- I tried to simply "alter table tablename add column columnname int not
null default 0", but it did not end in 24 hours - it updates these tens
of indexes in single process, using 100% of a single CPU - unacceptable.
- I mark all the indexes as invalid (currently I'm just saving and
dropping them).
- This time adding of the column takes an hour.
- I vacuum full or cluster the table, as it has now bloated at least to
200% - this is also fast, as indexes are not updated.
- I'm reindexing (currently recreating) all indexes, but instead of
calculating them on a single CPU I use all available CPU's (I can have
like 40 of them on Amazon AWS RDS cheaply if I need them only for a few
hours). This also ends in an hour.
- The world is saved and everybody celebrate.

Dropping and recreating indexes is not a very safe operation - their
definitions need to be saved somewhere out of the database and if
anything goes wrong it would not be easy to restore them. If it would be
possible to just set them invalid (feature request in passing) and if
invalid indexed would not add cost on updates, then this would be much
easier and safer.

--
Tomasz "Tometzky" Ostrowski

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message pgerber 2016-07-13 13:21:22 BUG #14247: COMMENT is restored on wrong database
Previous Message David Waller 2016-07-13 10:42:09 Re: BUG #14237: Terrible performance after accidentally running 'drop index' for index still being created