Re: ATTACH/DETACH PARTITION CONCURRENTLY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Sergei Kornilov <sk(at)zsrv(dot)org>, Amit Langote <langote_amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: ATTACH/DETACH PARTITION CONCURRENTLY
Date: 2018-12-21 16:35:24
Message-ID: CA+TgmoZAKengWkQVxff0EwcU8X=wP1MUu_bn2mSC42HUcsUbUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 20, 2018 at 4:38 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Lowering the lock level might also make something that was previously
> safe into something unsafe, because now there's no longer a guarantee
> that invalidation messages are received soon enough. With
> AccessExclusiveLock, we'll send invalidation messages before releasing
> the lock, and other processes will acquire the lock and then
> AcceptInvalidationMessages(). But with ShareUpdateExclusiveLock the
> locks can coexist, so now there might be trouble. I think this is an
> area where we need to do some more investigation.

So there are definitely problems here. With my patch:

create table tab (a int, b text) partition by range (a);
create table tab1 partition of tab for values from (0) to (10);
prepare t as select * from tab;
begin;
explain execute t; -- seq scan on tab1
execute t; -- no rows

Then, in another session:

alter table tab detach partition tab1;
insert into tab1 values (300, 'oops');

Back to the first session:

execute t; -- shows (300, 'oops')
explain execute t; -- still planning to scan tab1
commit;
explain execute t; -- now it got the memo, and plans to scan nothing
execute t; -- no rows

Well, that's not good. We're showing a value that was never within
the partition bounds of any partition of tab. The problem is that,
since we already have locks on all relevant objects, nothing triggers
the second 'explain execute' to process invalidation messages, so we
don't update the plan. Generally, any DDL with less than
AccessExclusiveLock has this issue. On another thread, I was
discussing with Tom and Peter the possibility of trying to rejigger
things so that we always AcceptInvalidationMessages() at least once
per command, but I think that just turns this into a race: if a
concurrent commit happens after 'explain execute t' decides not to
re-plan but before it begins executing, we have the same problem.

This example doesn't involve partition pruning, and in general I don't
think that the problem is confined to partition pruning. It's rather
that if there's no conflict between the lock that is needed to change
the set of partitions and the lock that is needed to run a query, then
there's no way to guarantee that the query runs with the same set of
partitions for which it was planned. Unless I'm mistaken, which I
might be, this is also a problem with your approach -- if you repeat
the same prepared query in the same transaction, the transaction
snapshot will be updated, and thus the PartitionDesc will be expanded
differently at execution time, but the plan will not have changed,
because invalidation messages have not been processed.

Anyway, I think the only fix here is likely to be making the executor
resilient against concurrent changes in the PartitionDesc. I don't
think there's going to be any easy way to compensate for added
partitions without re-planning, but maybe we could find a way to flag
detached partitions so that they return no rows without actually
touching the underlying relation.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark 2018-12-21 16:39:07 Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2
Previous Message Tom Lane 2018-12-21 16:21:14 Re: START/END line number for COPY FROM