From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, amitdkhan(dot)pg(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Parallel Append can break run-time partition pruning |
Date: | 2020-04-15 07:18:42 |
Message-ID: | CAApHDvqSchs+ubdybcfFaSPB++EA7kqMaoqajtP0GtZvzOOR3g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I had a report from the wilds that run-time partition pruning was not
working in certain cases.
After some investigation and obtaining the mockup of the actual case,
I discovered that the problem was down to accumulate_append_subpath()
hitting the case where it does not pullup a Parallel Append where the
first parallel node is > 0.
What's actually happening is that the plan is left with a nested
Append, and in this particular case, the top-level Append only has a
single subpath, to which the code for 8edd0e794 (Suppress Append and
MergeAppend plan nodes that have a single child) causes the nested
Append to be pulled up to become the main Append. This causes
run-time pruning to break since we only attach the pruning information
to the top-level Append.
The most simplified test case I can find to demonstrate this issue is:
create table list (a int, b int) partition by list(a);
create table list_12 partition of list for values in(1,2) partition by list(a);
create table list_12_1 partition of list_12 for values in(1);
create table list_12_2 partition of list_12 for values in(2);
insert into list select 2,0 from generate_Series(1,1000000) x;
vacuum analyze list;
explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
-- force the 2nd subnode of the Append to be non-parallel.
alter table list_12_1 set (parallel_workers=0);
explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
The results of this in master are:
postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
QUERY PLAN
---------------------------------------------------------------------------
Gather (actual rows=0 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
-> Parallel Append (actual rows=0 loops=3)
-> Parallel Seq Scan on list_12_2 list_2 (never executed)
Filter: ((b > 0) AND (a = $0))
-> Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
Filter: ((b > 0) AND (a = $0))
(11 rows)
postgres=# alter table list_12_1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain (analyze on, costs off, timing off, summary off)
select * from list where a = (select 1) and b > 0;
QUERY PLAN
---------------------------------------------------------------------------
Gather (actual rows=0 loops=1)
Workers Planned: 2
Params Evaluated: $0
Workers Launched: 2
InitPlan 1 (returns $0)
-> Result (actual rows=1 loops=1)
-> Parallel Append (actual rows=0 loops=3)
-> Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
Filter: ((b > 0) AND (a = $0))
-> Parallel Seq Scan on list_12_2 list_2 (actual rows=0 loops=3)
Filter: ((b > 0) AND (a = $0))
Rows Removed by Filter: 333333
(12 rows)
Notice that we don't get "(never executed)" for list_12_2 in the 2nd case.
I'm a bit divided on what the correct fix is. If I blame Parallel
Append for not trying hard enough to pull up the lower Append in
accumulate_append_subpath(), then clearly the parallel append code is
to blame. However, perhaps run-time pruning should be tagging on
PartitionPruneInfo to more than top-level Appends. Fixing the latter
case, code-wise is about as simple as removing the "rel->reloptkind ==
RELOPT_BASEREL &&" line from create_append_plan(). Certainly, if the
outer Append hadn't been a single subpath Append, then we wouldn't
have pulled up the lower-level Append, so perhaps we should be
run-time pruning lower-level ones too.
What do other people think?
(copying in Robert and Amit K due to their work on Parallel Append,
Tom as I seem to remember him complaining about
accumulate_append_subpath() at some point and Amit L because...
partitioning...)
David
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-04-15 07:26:50 | Re: Race condition in SyncRepGetSyncStandbysPriority |
Previous Message | Andrey M. Borodin | 2020-04-15 07:13:57 | Allow pg_read_all_stats to read pg_stat_progress_* |