Re: Support for REINDEX CONCURRENTLY

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2012-10-06 02:57:24
Message-ID: CAB7nPqRJq6UAjL3ORUW5cCN5+NYud-NoGnk_T5h7C2c_sJrMUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 6, 2012 at 8:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > OK. That is a different approach and would limit strictly the amount of
> > code necessary for the feature, but I feel that it breaks the nature of
> > CONCURRENTLY which should run without any exclusive locks.
>
> Hm? The whole point is that the CONCURRENTLY commands don't require
> exclusive locks. Only the SWAP command would.
>
Yes, but my point is that it is more user-friendly to have such a
functionality with a single command.
By having something without locks, you could use the concurrent APIs to
perform a REINDEX automatically in autovacuum for example.
Also, the possibility to perform concurrent operations entirely without
exclusive locks is not a problem limited to REINDEX, there would be for
sure similar problems if CLUSTER CONCURRENTLY or ALTER TABLE CONCURRENTLY
are wanted.

>
> > Until now all the approaches investigated (switch of relfilenode, switch
> of
> > index OID) need to have an exclusive lock because we try to maintain
> index
> > OID as consistent. In the patch I submitted, the new index created has a
> > different OID than the old index, and simply switches names. So after the
> > REINDEX CONCURRENTLY the OID of index on the table is different, but seen
> > from user the name is the same. Is it acceptable to consider that a
> reindex
> > concurrently could change the OID of the index rebuild?
>
> That is not going to work without ex-lock somewhere. If you change the
> index's OID then you will have to change pg_constraint and pg_depend
> entries referencing it, and that creates race condition hazards for
> other processes looking at those catalogs. I'm not convinced that you
> can even do a rename safely without ex-lock. Basically, any DDL update
> on an active index is going to be dangerous and probably impossible
> without lock, IMO.
>
In the current version of the patch, at the beginning of process a new
index is created. It is a twin of the index it has to replace, meaning that
it copies the dependencies of old index and creates twin entries of the old
index even in pg_depend and pg_constraint also if necessary. So the old
index and the new index have exactly the same data in catalog, they are
completely decoupled, and you do not need to worry about the OID
replacements and the visibility consequences.
Knowing that both indexes are completely separate entities, isn't this
enough to change the new index as the old one with a low-level lock? In the
case of my patch only the names are simply exchanged and make the user
unaware of what is happening in background. This behaves similarly to
pg_reorg, explaining why the OIDs of tables reorganized are changed after
being pg_reorg'ed.

To answer your question, I don't think anyone would object to the
> index's OID changing if the operation were safe otherwise. But I don't
> think that allowing that gets us to a safe solution.
>
OK thanks.
--
Michael Paquier
http://michael.otacoo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomonari Katsumata 2012-10-06 03:16:38 error message is not proper for analyze on standby.
Previous Message John R Pierce 2012-10-06 02:14:00 Re: Bad Data back Door