Quick Links

Re: [HACKERS] Runtime Partition Pruning

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Beena Emerson <memissemerson(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject:	Re: [HACKERS] Runtime Partition Pruning
Date:	2017-12-21 09:01:48
Message-ID:	CAKJS1f9xbyR6u5ixeYsDpSDaT_bvYy5DySrU+_aU7dQXiU9VQg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 19 December 2017 at 21:54, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
> PFA the updated patch.

Hi Beena,

Thanks for updating the patch. I've merged a chunk of your latest
patch with what I was working on and cleaned up some mistakes that I
had made in my WIP version.

I've now done the work to make this work with parallel append, which
wasn't so difficult. The 3 functions which choose the next subplan
just needed to be made aware that they need to pay attention to the
Bitmapset which lists which subplans need to be scanned. I did have to
write a bms_prev_member() function to get the next lower set bit in a
Bitmapset to make this easier. I think what I have is fairly clean.

I've also solved the problem we had discussed about only needing to
reselecting the subplans during ExecReScanAppend() when a parameter
has changed that is actually used in run-time pruning. If it's not
used, we now no longer wastefully reselect the same matching subplans.

The patch still does need more work in match_clauses_to_partkey() to
ensure we extract all paramids from the clauses. Right now, I'm only
grabbing the paramids from an OpExpr. There's a bunch of other places
Params could be hidden in there. Anything Amit is supporting for
Consts we need to grab the Param Ids for when it's a Param instead of
a Const.

There's also a few things about the patch which I didn't change around
too much as I didn't want change Amit's v15 patch too much. He working
on this still and I didn't want too many conflicts. Basically, I had
to make match_clauses_to_partkey() an external function so that I
could use it in createplan.c in create_append_plan() in order to
extract the Param Ids of any parameters in the parameterized path
clause. This function is currently in allpaths.c in Amit's patch, but
I'm feeling that it does not really belong there. I'll discuss with
Amit on the faster partition pruning thread.

The attached patch is pretty fresh. I've not given it a huge amount of
testing so far. I do know there's still work to do in order to make
cases like;

prepare q2 (int) as select * from p where a in(10000,20000,$1);
explain (costs off, analyze) execute q2 (45678);

work correctly. To fix that requires a bit more invasion into the
faster partition pruning v15 patch which this is based on. I don't
really want to touch that too much just yet.

Here are some examples of what the patch does:

-- Test 1 Setup
-- This shows the patch working with multiple partition levels.

drop table if exists pt;
create table pt (a int, b int) partition by range (a);

create table pt_a_neg partition of pt for values from (minvalue) to
(0) partition by range (b);
create table pt_a_pos partition of pt for values from (0) to
(maxvalue) partition by range (b);

create table pt_a_neg_b_neg partition of pt_a_neg for values from
(minvalue) to (0);
create table pt_a_neg_b_pos partition of pt_a_neg for values from (0)
to (maxvalue);

create table pt_a_pos_b_neg partition of pt_a_pos for values from
(minvalue) to (0);
create table pt_a_pos_b_pos partition of pt_a_pos for values from (0)
to (maxvalue);

insert into pt select x,x from generate_series(-1000,1000) x;

analyze pt;

prepare q1 (int, int) as select * from pt where a = $1 and b = $2;

-- Test 1 Result (first 5 custom plan executions removed)

postgres=# explain (costs off, analyze) execute q1 (-10,10);
QUERY PLAN
----------------------------------------------------------------------------
Append (actual time=0.007..0.007 rows=0 loops=1)
-> Seq Scan on pt_a_neg_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_neg_b_pos (actual time=0.007..0.007 rows=0 loops=1)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
Planning time: 0.366 ms
Execution time: 0.077 ms
(11 rows)

postgres=# explain (costs off, analyze) execute q1 (-10,-10);
QUERY PLAN
----------------------------------------------------------------------------
Append (actual time=0.235..0.237 rows=1 loops=1)
-> Seq Scan on pt_a_neg_b_neg (actual time=0.234..0.236 rows=1 loops=1)
Filter: ((a = $1) AND (b = $2))
Rows Removed by Filter: 999
-> Seq Scan on pt_a_neg_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
Planning time: 0.025 ms
Execution time: 0.313 ms
(12 rows)

postgres=# explain (costs off, analyze) execute q1 (10,-10);
QUERY PLAN
----------------------------------------------------------------------------
Append (actual time=0.014..0.014 rows=0 loops=1)
-> Seq Scan on pt_a_neg_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_neg_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_neg (actual time=0.013..0.013 rows=0 loops=1)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
Planning time: 0.025 ms
Execution time: 0.091 ms
(11 rows)

postgres=# explain (costs off, analyze) execute q1 (10,10);
QUERY PLAN
----------------------------------------------------------------------------
Append (actual time=0.032..0.222 rows=1 loops=1)
-> Seq Scan on pt_a_neg_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_neg_b_pos (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_neg (never executed)
Filter: ((a = $1) AND (b = $2))
-> Seq Scan on pt_a_pos_b_pos (actual time=0.031..0.221 rows=1 loops=1)
Filter: ((a = $1) AND (b = $2))
Rows Removed by Filter: 1000
Planning time: 0.026 ms
Execution time: 0.297 ms
(12 rows)

-- Test 2 Setup
-- This test shows the patch working with parameterized paths. Note
that only subplans which actually have a value matching a value from
the outer side of the join get scanned.

create table p (a int not null) partition by range(a);
select 'create table p'||x|| ' partition of p for values from ('||x *
10000 || ') to (' || (x+1)*10000 || ');' from generate_Series(0,9)x;
\gexec

insert into p select generate_Series(0,99999);

select 'create index on p'||x||' (a)' from generate_Series(0,9)x;
\gexec

create table t (a int not null);

insert into t values(4),(5);

analyze p,t;
set enable_mergejoin=0;
explain (costs off, analyze) select * from t inner join p p on p.a=t.a;
insert into t values(45678);
explain (costs off, analyze) select * from t inner join p p on p.a=t.a;

-- Test 2 Result
postgres=# explain (costs off, analyze) select * from t inner join p p
on p.a=t.a;
QUERY PLAN
----------------------------------------------------------------------------------------------
Nested Loop (actual time=0.058..0.069 rows=2 loops=1)
-> Seq Scan on t (actual time=0.011..0.012 rows=2 loops=1)
-> Append (actual time=0.020..0.021 rows=1 loops=2)
-> Index Only Scan using p0_a_idx on p0 p (actual
time=0.019..0.020 rows=1 loops=2)
Index Cond: (a = t.a)
Heap Fetches: 2
-> Index Only Scan using p1_a_idx on p1 p_1 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p2_a_idx on p2 p_2 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p3_a_idx on p3 p_3 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p4_a_idx on p4 p_4 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p5_a_idx on p5 p_5 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p6_a_idx on p6 p_6 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p7_a_idx on p7 p_7 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p8_a_idx on p8 p_8 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p9_a_idx on p9 p_9 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
Planning time: 1.080 ms
Execution time: 0.315 ms
(35 rows)

postgres=# insert into t values(45678); -- add a record to show that
p4 gets scanned.
INSERT 0 1
postgres=# explain (costs off, analyze) select * from t inner join p p
on p.a=t.a;
QUERY PLAN
------------------------------------------------------------------------------------------------
Nested Loop (actual time=0.041..0.106 rows=3 loops=1)
-> Seq Scan on t (actual time=0.014..0.014 rows=3 loops=1)
-> Append (actual time=0.024..0.025 rows=1 loops=3)
-> Index Only Scan using p0_a_idx on p0 p (actual
time=0.010..0.011 rows=1 loops=2)
Index Cond: (a = t.a)
Heap Fetches: 2
-> Index Only Scan using p1_a_idx on p1 p_1 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p2_a_idx on p2 p_2 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p3_a_idx on p3 p_3 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p4_a_idx on p4 p_4 (actual
time=0.049..0.050 rows=1 loops=1)
Index Cond: (a = t.a)
Heap Fetches: 1
-> Index Only Scan using p5_a_idx on p5 p_5 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p6_a_idx on p6 p_6 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p7_a_idx on p7 p_7 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p8_a_idx on p8 p_8 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
-> Index Only Scan using p9_a_idx on p9 p_9 (never executed)
Index Cond: (a = t.a)
Heap Fetches: 0
Planning time: 0.635 ms
Execution time: 0.233 ms
(35 rows)

There is, however, still a fundamental problem with the patch, or
maybe the idea in general (this was also pointed out by Amit Langote
in an off-list discussion):

The problem is down to the logic in choose_custom_plan() only choosing
a generic plan if the average cost of the generic plan is less than
the average custom plan cost. The problem is that the generic plan can
have many extra Append subnodes in comparison to the custom plan, all
of which are taken into account in the total plan cost, but these may
be pruned during execution. The logic in choose_custom_plan() has no
idea about this. I don't have any bright ideas on how to fix this
yet, as, suppose a PREPAREd statement like the following comes along:

PREPARE q3 (int, int) AS SELECT * FROM partitioned_table WHERE partkey
BETWEEN $1 AND $2;

the run-time pruning may prune it down no subplans, all subplans, or
any number in between. So we can't do anything like take the total
Append cost to be the highest costing of its subplans, and likely
using the average cost might not be a good idea either. It might work
sometimes, but likely won't be very stable. If this is not fixed then
choose_custom_plan() has a very low probability of choosing a generic
plan which has run-time partition pruning enabled, which in a way
defeats the purpose of this whole patch.

I'm a bit uncertain on the best way to resolve this. It needs to be
discussed here.

One more thing. The attached is not yet set up to work with
MergeAppend. It's likely just a small amount of additional work to
make this happen, so likely should be something that we do.

Anyway, I've attached the latest version of the patch. This is based
on Amit's v15 of faster-partition-pruning [1] which I found to cleanly
apply to f94eec490

[1] https://www.postgresql.org/message-id/06cde8a5-0ac7-dcf5-ad66-1ca781623e0c@lab.ntt.co.jp

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
runtime_prune_drowley_v2.patch	application/octet-stream	53.2 KB

In response to

Re: [HACKERS] Runtime Partition Pruning at 2017-12-19 08:54:26 from Beena Emerson

Responses

Re: [HACKERS] Runtime Partition Pruning at 2017-12-21 10:51:57 from Beena Emerson
Re: [HACKERS] Runtime Partition Pruning at 2017-12-21 12:56:37 from David Rowley
Re: [HACKERS] Runtime Partition Pruning at 2017-12-21 23:45:15 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2017-12-21 09:29:40	Re: pgsql: Add parallel-aware hash joins.
Previous Message	Andres Freund	2017-12-21 08:49:49	Re: [HACKERS] Parallel Hash take II