Re: [HACKERS] path toward faster partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-03 00:32:46
Message-ID: CAKJS1f-3-P=mqeye-kvgkv=gLNOXU10Td6JGaPkcu4N7j_=Kyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 March 2018 at 04:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> 2. Removing redundant clauses from each list, that is,
>> remove_redundant_clauses() to produce lists with just one member per
>> operator strategy for each partition key,
>
> I don't see that there's a real need to perform step 2 at all. I
> mean, if you have x > $1 and x > $2 in the query, you can just compute
> the set of partitions for the first clause, compute the set of
> partitions for the second clause, and then intersect. That doesn't
> seem obviously worse than deciding which of $1 and $2 is greater and
> then pruning only based on whichever one is greater in this case.

This is an interesting idea. It may simplify the code quite a bit as
the clause reduction code is quite bulky. However, unless I'm
mistaken, for this to work, certain inputs will need significantly
more processing to determine the minimum set of matching partitions.

Let's look at the following perhaps unlikely case. (I picked an
extreme case to demonstrate why this may be an inferior method)

Given the table abc (...) partition by range (a,b,c), with the query:

select * from abc where a >= 1 and a >= 2 and a >= 3 and b >= 1 and b
>= 2 and b = 3 and c >= 1 and c >= 2 and c = 3;

We would likely still be parsing those clauses into some struct like
PartitionClauseInfo and would end up with some arrays or Lists with
the clauses segmented by partition key.

It appears to me, for your method to work we'd need to try every
combination of the clauses matching each partition key, which in this
case is 3 * 3 * 3 searches. Amit's current method is 1 search, after
the clause reduction which is 3 + 3 + 3 (O(N) per partition key)

I've tried to think of a more genuine poor performing case for this
with IN or NOT IN lists, but I can't quite see it since NOT IN will
only be supported by LIST partitioning, which can only have a single
partition key and IN would be OR conditions, each of which would be
evaluated in a different round of looping. Although I'm not ruling out
that my imagination is just not good enough.

With that considered, is it still a good idea to do it this way?

Or maybe I've misunderstood the idea completely?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-03 00:38:03 Re: JIT compiling with LLVM v11
Previous Message Andres Freund 2018-03-03 00:29:54 Re: JIT compiling with LLVM v11