Parallel Append can break run-time partition pruning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, amitdkhan(dot)pg(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Parallel Append can break run-time partition pruning
Date: 2020-04-15 07:18:42
Message-ID: CAApHDvqSchs+ubdybcfFaSPB++EA7kqMaoqajtP0GtZvzOOR3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I had a report from the wilds that run-time partition pruning was not
working in certain cases.

After some investigation and obtaining the mockup of the actual case,
I discovered that the problem was down to accumulate_append_subpath()
hitting the case where it does not pullup a Parallel Append where the
first parallel node is > 0.

What's actually happening is that the plan is left with a nested
Append, and in this particular case, the top-level Append only has a
single subpath, to which the code for 8edd0e794 (Suppress Append and
MergeAppend plan nodes that have a single child) causes the nested
Append to be pulled up to become the main Append. This causes
run-time pruning to break since we only attach the pruning information
to the top-level Append.

The most simplified test case I can find to demonstrate this issue is:

create table list (a int, b int) partition by list(a);
create table list_12 partition of list for values in(1,2) partition by list(a);
create table list_12_1 partition of list_12 for values in(1);
create table list_12_2 partition of list_12 for values in(2);

insert into list select 2,0 from generate_Series(1,1000000) x;
vacuum analyze list;

explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;

-- force the 2nd subnode of the Append to be non-parallel.
alter table list_12_1 set (parallel_workers=0);

explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;

The results of this in master are:

postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
QUERY PLAN
---------------------------------------------------------------------------
Gather (actual rows=0 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
-> Parallel Append (actual rows=0 loops=3)
-> Parallel Seq Scan on list_12_2 list_2 (never executed)
Filter: ((b > 0) AND (a = $0))
-> Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
Filter: ((b > 0) AND (a = $0))
(11 rows)

postgres=# alter table list_12_1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
QUERY PLAN
---------------------------------------------------------------------------
Gather (actual rows=0 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
-> Parallel Append (actual rows=0 loops=3)
-> Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
Filter: ((b > 0) AND (a = $0))
-> Parallel Seq Scan on list_12_2 list_2 (actual rows=0 loops=3)
Filter: ((b > 0) AND (a = $0))
Rows Removed by Filter: 333333
(12 rows)

Notice that we don't get "(never executed)" for list_12_2 in the 2nd case.

I'm a bit divided on what the correct fix is. If I blame Parallel
Append for not trying hard enough to pull up the lower Append in
accumulate_append_subpath(), then clearly the parallel append code is
to blame. However, perhaps run-time pruning should be tagging on
PartitionPruneInfo to more than top-level Appends. Fixing the latter
case, code-wise is about as simple as removing the "rel->reloptkind ==
RELOPT_BASEREL &&" line from create_append_plan(). Certainly, if the
outer Append hadn't been a single subpath Append, then we wouldn't
have pulled up the lower-level Append, so perhaps we should be
run-time pruning lower-level ones too.

What do other people think?

(copying in Robert and Amit K due to their work on Parallel Append,
Tom as I seem to remember him complaining about
accumulate_append_subpath() at some point and Amit L because...
partitioning...)

David

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-15 07:26:50 Re: Race condition in SyncRepGetSyncStandbysPriority
Previous Message Andrey M. Borodin 2020-04-15 07:13:57 Allow pg_read_all_stats to read pg_stat_progress_*