Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-05 08:38:17
Message-ID: 2d655c47-e6ab-92af-b426-c1b0ae9c7ca3@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018/03/03 0:47, Robert Haas wrote:
> On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote wrote:
>> Given the patch's implementation, we'll have to make the structure of that
>> Node tree a bit more complex than a simple List. For one thing, the patch
>> handles OR clauses by performing pruning separately for each arm and them
>> combining partitions selected across OR arms using set union. By
>> "performing pruning" in the last sentence I meant following steps similar
>> to ones you wrote in your message:
>>
>> 1. Segregating pruning clauses into per-partition-key Lists, that is,
>> generate_partition_clauses() producing a PartitionClauseInfo,
>>
>> 2. Removing redundant clauses from each list, that is,
>> remove_redundant_clauses() to produce lists with just one member per
>> operator strategy for each partition key,
>>
>> 3. Extracting Datum values from the clauses to form equal/min/max tuples
>> and setting null or not null bits for individual keys, that is,
>> extract_bounding_datums() producing a PartScanKeyInfo, and
>>
>> 4. Finally pruning with those Datum tuples and null/not null info, that
>> is, get_partitions_for_keys().
>>
>> Steps 2-4 are dependent on clauses providing Datums, which all the clauses
>> may or may not do. Depending on whether or not, we'll have to defer those
>> steps to run time.
>
> I don't see that there's a real need to perform step 2 at all. I
> mean, if you have x > $1 and x > $2 in the query, you can just compute
> the set of partitions for the first clause, compute the set of
> partitions for the second clause, and then intersect. That doesn't
> seem obviously worse than deciding which of $1 and $2 is greater and
> then pruning only based on whichever one is greater in this case.

If we can accommodate this step with your "pruning program" idea below, I
think we should still try to keep the remove_redundant_clauses() step.

If we can keep this step, x > $1 and x > $1 will require 1 comparison + 1
bsearch over bounds, whereas without it, it'd be 2 bsearches over bounds +
1 Bitmapset operation. I think David expressed a similar concern about
the performance in his reply down-thread.

That too as long as we implement this step such that having it in the
pipeline doesn't restrict the representation that we need to have the
"pruning steps" in. Let's see.

>> * What do we encode into the Node tree attached to the plan? Clauses that
>> haven't gone through steps 2 and 3 (something like PartitionClauseInfo)
>> or the product of step 3 (something like PartScanKeyInfo)?
>>
>> * How do we account for OR clauses? Perhaps by having the aforementioned
>> Node trees nested inside the top-level one, wherein there will be one
>> nested node per arm of an OR clause.
>
> Suppose we define the notion of a pruning program. A pruning program
> can use any number of registers, which have integer numbers starting
> with 0 and counting upward as high as necessary. Each register holds
> a Bitmapset. The result of a pruning program is the value of register
> 0 when the program completes. A pruning program consists of a list of
> steps, each of which is either a PruningBaseStep or a
> PruningCombineStep. A PruningCombineStep modifies the contents of the
> target register based on the contents of a source register in one of
> the following three ways: (1) UNION -- all bits set in source become
> set in target; (2) INTERSECT -- all bits clear in source become clear
> in target; (3) DIFFERENCE -- all bits set in source become clear in
> target. A PruningBaseStep consists of a strategy (equality,
> less-than, etc.), an output register, and list of expressions --
> either as many as there are partition keys, or for range partitioning
> perhaps fewer; it prunes based on the strategy and the expressions and
> overwrites the output register with the partitions that would be
> selected.
>
> Example #1. Table is hash-partitioned on a and b. Given a query like
> SELECT * FROM tab WHERE a = 100 AND b = 233, we create a single-step
> program:
>
> 1. base-step (strategy =, register 0, expressions 100, 233)
>
> If there were an equality constraint on one of the two columns, we
> would not create a pruning program at all, because no pruning is
> possible.
>
> Example #2. Table is list-partitioned on a. Given a query like SELECT
> * FROM tab WHERE (a = $1 OR a = $2) AND a != $3, we create this
> program:
>
> 1. base-step (strategy =, register 0, expressions $1)
> 2. base-step (strategy =, register 1, expressions $2)
> 3. base-step (strategy !=, register 2, expressions $3)
> 4. combine-step (target-register 0, source-register 1, strategy union)
> 5. combine-step (target-register 0, source-register 2, strategy difference)
>
> (This is unoptimized -- one could do better by reversing steps 3 and 4
> and using reusing register 1 instead of needing register 2, but that
> kind of optimization is probably not too important.)
>
> Example #3. Table is range-partitioned on a and b. Given a query like
> SELECT * FROM tab WHERE (a = 40 AND b > $1) OR (a = $2 AND b = $3), we
> do this:
>
> 1. base-step (strategy >, register 0, expressions 40, $1)
> 2. base-step (strategy =, register 1, expressions $2, $3)
> 3. combine-step (target-register 0, source-register 1, strategy union)
>
> You might need a few extra gadgets here to make all of this work --
> e.g. another base-step strategy to handle ScalarArrayOpExpr; I'm just
> trying to convey the basic idea here. It's pretty easy to see how to
> store a program like this as a node tree: just create PruningBaseStep
> and PruningCombineStep nodes and stick them into a List. At execution
> time transform the List into an array and loop over it.
>
> Or possibly it would be better to have two lists, one of base steps
> without explicit register numbers, where step N always outputs to
> register N, and then a second list of combine steps. Then at
> execution time you could have an array of PruningBaseStep * and an
> array of PruningCombineStep * instead of a combined array of Node *,
> which might be quicker to process.
>
> But regardless of what you do exactly, I think you should try to come
> up with some kind of representation that is basically uniform,
> handling all the things you support in a similar fashion. The current
> patch has basically separate and somewhat ad-hoc representations for
> the regular case, the <> case, and the OR case, which I think is not
> ideal because you end up with more code and a certain amount of
> repeated logic.

Thanks for outlining this idea.

Given that the most important part in your outline seems to be the clean
Node structure to carry the information from one stage to another, that
will result by adopting the "partition pruning primitives" described
above, I will first try to implement them as a shell around the code that
I and David have debugged till now. Then, once everything seems to work,
I will start dropping unneeded structures and code that seems to be
duplicative, which I'm beginning to suspect there will be plenty of. I'll
post an update in a couple of days to report on how that works out.

Regards,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-03-05 08:41:26 Re: [PATCH] Verify Checksums during Basebackups
Previous Message Konstantin Knizhnik 2018-03-05 07:56:32 Re: Contention preventing locking