Re: REINDEX CONCURRENTLY 2.0

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REINDEX CONCURRENTLY 2.0
Date: 2017-08-31 16:22:59
Message-ID: 3db0a2cf-2319-403d-d3ae-28a9cb52af1a@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have attached a new, rebased version of the batch with most of Banck's
and some of your feedback incorporated. Thanks for the good feedback!

On 03/31/2017 08:27 AM, Michael Paquier wrote> When running REINDEX
SCHEMA CONCURRENTLY public on the regression
> database I am bumping into a bunch of these warnings:
> WARNING: 01000: snapshot 0x7fa5e6000040 still active
> LOCATION: AtEOXact_Snapshot, snapmgr.c:1123
> WARNING: 01000: snapshot 0x7fa5e6000040 still active
> LOCATION: AtEOXact_Snapshot, snapmgr.c:1123

I failed to reproduce this. Do you have a reproducible test case?

> + * Reset attcacheoff for a TupleDesc
> + */
> +void
> +ResetTupleDescCache(TupleDesc tupdesc)
> +{
> + int i;
> +
> + for (i = 0; i < tupdesc->natts; i++)
> + tupdesc->attrs[i]->attcacheoff = -1;
> +}
> I think that it would be better to merge that with TupleDescInitEntry
> to be sure that the initialization of a TupleDesc's attribute goes
> through only one code path.

Sorry, but I am not sure I understand your suggestion. I do not like the
ResetTupleDescCache function so all suggestions are welcome.
> -REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM
> } <replaceable class="PARAMETER">name</replaceable>
> +REINDEX [ ( VERBOSE ) ] { INDEX | TABLE | SCHEMA | DATABASE | SYSTEM
> } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable>
> I am taking the war path with such a sentence... But what about adding
> CONCURRENTLY to the list of options in parenthesis instead?

I have thought some about this myself and I do not care strongly either way.

> - Explore the use of SQL-level interfaces to mark an index as inactive
> at creation.
> - Remove work done in changeDependencyForAll, and replace it by
> something similar to what tablecmds.c does. There is I think here some
> place for refactoring if that's not with CREATE TABLE LIKE. This
> requires to the same work of creation, renaming and drop of the old
> triggers and constraints.

I am no fan of the current code duplication and how fragile it is, but I
think these cases are sufficiently different to prevent meaningful code
reuse. But it could just be me who is unfamiliar with that part of the code.

> - Do a per-index rebuild and not a per-relation rebuild for concurrent
> indexing. Doing a per-relation reindex has the disadvantage that many
> objects need to be created at the same time, and in the case of
> REINDEX CONCURRENTLY time of the operation is not what matters, it is
> how intrusive the operation is. Relations with many indexes would also
> result in much object locks taken at each step.

I am still leaning towards my current tradeoff since waiting for all
queries to stop using an index can take a lot of time and if you only
have to do that once per table it would be a huge benefit under some
workloads, and you can still reindex each index separately if you need to.

Andreas

Attachment Content-Type Size
reindex-concurrently-v3.patch text/x-patch 91.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-31 17:46:07 Re: Assorted leaks and weirdness in parallel execution
Previous Message Bossart, Nathan 2017-08-31 15:25:58 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands