Re: Invalid indexes should not consume update overhead

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl>
Cc: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Invalid indexes should not consume update overhead
Date: 2016-07-16 12:55:08
Message-ID: CAA4eK1JNmxL=kKB210GfA14q4T4yAC_U6XPo6yhi9VuBhxEPog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jul 13, 2016 at 4:40 PM, Tomasz Ostrowski
<tometzky+pg(at)ato(dot)waw(dot)pl> wrote:
> When index is build concurrently and this build fails the index is left in
> invalid state. It's basically unusable for anything, but, according to
> documentation:
>
>> If a problem arises while scanning the table, such as a deadlock or
>> a uniqueness violation in a unique index, the CREATE INDEX command
>> will fail but leave behind an "invalid" index. This index will be
>> ignored for querying purposes because it might be incomplete; however
>> it *will* *still* *consume* *update* *overhead*. The psql \d command
>> will report such an index as INVALID
>
>
> I think this update overhead is actually wasted - there's no way to make use
> of it, as the only way to make the index usable again is to reindex it or
> drop and recreate.
>
> In the other hand if invalid indexes would have no update overhead then they
> may actually be useful. Please consider the following situation (taken from
> the real world):
>
> - I have a very large table with a large number of indexes on a server with
> large number of CPUs.
> - I need to add a new not null column and I need to do this in a fairly
> small maintenance window.
> - I tried to simply "alter table tablename add column columnname int not
> null default 0", but it did not end in 24 hours - it updates these tens of
> indexes in single process, using 100% of a single CPU - unacceptable.
> - I mark all the indexes as invalid (currently I'm just saving and dropping
> them).
> - This time adding of the column takes an hour.
> - I vacuum full or cluster the table, as it has now bloated at least to 200%
> - this is also fast, as indexes are not updated.
> - I'm reindexing (currently recreating) all indexes, but instead of
> calculating them on a single CPU I use all available CPU's (I can have like
> 40 of them on Amazon AWS RDS cheaply if I need them only for a few hours).
> This also ends in an hour.
> - The world is saved and everybody celebrate.
>
> Dropping and recreating indexes is not a very safe operation - their
> definitions need to be saved somewhere out of the database and if anything
> goes wrong it would not be easy to restore them. If it would be possible to
> just set them invalid (feature request in passing) and if invalid indexed
> would not add cost on updates, then this would be much easier and safer.
>

How can you consider marking invalid index as valid index without
reindexing it? It is quite possible that in the meantime the table
has been updated.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-07-16 19:35:36 Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column
Previous Message Peter Geoghegan 2016-07-16 02:21:06 Re: BUG #14210: filter by "=" constraint doesn't work when hash index is present on a column