Re: Should I implement DROP INDEX CONCURRENTLY?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should I implement DROP INDEX CONCURRENTLY?
Date: 2012-01-21 01:53:33
Message-ID: CA+TgmoahY-RJFBAe2x5MDGrXmAMLJAD48fBLnnbM9GD1PX34eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Can I just check with you that the only review comment is a one line
>>> change? Seems better to make any additional review comments in one go.
>>
>> No, I haven't done a full review yet - I just noticed that right at
>> the outset and wanted to be sure that got addressed.  I can look it
>> over more carefully, and test it.
>
> Corrected your point, and another found during retest.
>
> Works as advertised, docs corrected.

This comment needs updating:

/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table? If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there. There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough. In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid. There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it. That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen. I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere. I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work. If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready. That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
[, ...] [ CASCADE | RESTRICT ]
CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict. After that you must write
CONCURRENTLY, followed by exactly one name. I think what you want is:

DROP INDEX [ IF EXISTS ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY. It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code. So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time. That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-01-21 02:16:34 Re: REVIEW: pg_stat_statements with query tree based normalization
Previous Message Noah Misch 2012-01-21 01:33:30 Re: Measuring relation free space