Re: Delay locking partitions during query execution

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Delay locking partitions during query execution
Date: 2019-01-28 01:26:10
Message-ID: CAKJS1f8-6FQYXxP8BOQ2LbP2Rzko6ahAwOdiH15h4_+ct2rrqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 4 Dec 2018 at 00:42, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> Over here and along similar lines to the above, but this time I'd like
> to take this even further and change things so we don't lock *any*
> partitions during AcquireExecutorLocks() and instead just lock them
> when we first access them with ExecGetRangeTableRelation(). This
> improves the situation when many partitions get run-time pruned, as
> we'll never bother locking those at all since we'll never call
> ExecGetRangeTableRelation() on them. We'll of course still lock the
> partitioned table so that plan invalidation works correctly.

I started looking at this patch again and I do see a problem with it.
Let me explain:

Currently during LockRelationOid() when we obtain a lock on a relation
we'll check for rel cache invalidation messages. This means that
during AcquireExecutorLocks(), as called from CheckCachedPlan(), we
could miss invalidation messages since we're no longer necessarily
locking the partitions.

I think this only creates problems for partition's reloptions being
changed and cached plans possibly being not properly invalidated when
that happens, but I might be wrong about that, but if I'm not, there
still also might be more important reasons to ensure we invalidate the
plan properly in the future.

The following shows the problem:

create table listp (a int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
insert into listp select x from generate_series(1,2)x,
generate_series(1,100000);
analyze listp;

-- session 1;
begin;
prepare q1 as select count(*) from listp;
explain (costs off,analyze, timing off, summary off) execute q1;
QUERY PLAN
----------------------------------------------------------------------------------
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual rows=1 loops=3)
-> Parallel Append (actual rows=66667 loops=3)
-> Parallel Seq Scan on listp2 (actual rows=33333 loops=3)
-> Parallel Seq Scan on listp1 (actual
rows=100000 loops=1)
(8 rows)
-- session 2
alter table listp1 set (parallel_workers=0);

-- session 1

explain (costs off, analyze, timing off, summary off) execute q1; --
same as previous plan, i.e. still does a parallel scan on listp1.

One way around this would be to always perform an invalidation on the
partition's parent when performing a relcache invalidation on the
partition. We could perhaps invalidate all the way up to the top
level partitioned table, that way we could just obtain a lock on the
target partitioned table during AcquireExecutorLocks(). I'm currently
only setting the delaylock flag to false for leaf partitions only.

It's not particularly great to think of invalidating the partitioned
table's relcache entry for this, but it's also not so great that we
lock all partitions when runtime pruning might prune away the
partition anyway. I don't see a way that we can have both, but I'm
happy to hear people's thoughts about this.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2019-01-28 01:43:10 [ANN] pg2arrow
Previous Message Jamison, Kirk 2019-01-28 00:50:25 RE: pg_upgrade: Pass -j down to vacuumdb