Re: path toward faster partition pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: path toward faster partition pruning
Date: 2017-09-26 14:33:04
Message-ID: CA+TgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 26, 2017 at 9:00 AM, Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Could you share your thoughts on the usage of PartitionAppendInfo's
> min_datum_idx / max_datum_idx ? Especially in relation to hash partitions.

This brings up something that I've kind of been thinking about. There
are sort of four cases when it comes to partition pruning:

1. There is exactly one matching partition. For example, this happens
when there is an equality constraint on every partition column.

2. There are multiple matching partitions which are consecutive. For
example, there is a single level of range partitioning with no default
partition and the single partitioning column is constrained by < > <=
or >=.

3. There are multiple matching partitions which are not consecutive.
This case is probably rare, but it can happen if there is a default
partition, if there are list partitions with multiple bounds that are
interleaved (e.g. p1 allows (1, 4), p2 allows (2), p3 allows (3, 5),
and the query allows values >= 4 and <= 5), if the query involves OR
conditions, or if there are multiple levels of partitioning (e.g.
partition by a, subpartition by b, put a range constraint on a and an
equality constraint on b).

4. There are no matching partitions.

One of the goals of this algorithm is to be fast. The obvious way to
cater to case (3) is to iterate through all partitions and test
whether each one works, returning a Bitmapset, but that is O(n).
Admittedly, it might be O(n) with a pretty small constant factor, but
it still seems like exactly the sort of thing that we want to avoid
given the desire to scale to higher partition counts.

I propose that we create a structure that looks like this:

struct foo {
int min_partition;
int max_partition;
Bitmapset *extra_partitions;
};

This indicates that all partitions from min_partition to max_partition
need to be scanned, and in addition any partitions in extra_partitions
need to be scanned. Assuming that we only consider cases where all
partition keys or a leading subset of the partition keys are
constrained, we'll generally be able to get by with just setting
min_partition and max_partition, but extra_partitions can be used to
handle default partitions and interleaved list bounds. For equality
on all partitioning columns, we can do a single bsearch of the bounds
to identify the target partition at a given partitioning level, and
the same thing works for a single range-bound. If there are two
range-bounds (< and > or <= and >= or whatever) we need to bsearch
twice. The default partition, if any and if matched, must also be
included. When there are multiple levels of partitioning things get a
bit more complex -- if someone wants to knock out a partition that
breaks up the range, we might need to shrink the main range to cover
part of it and kick the other indexes out to extra_partitions.

But the good thing is that in common cases with only O(lg n) effort we
can return O(1) data that describes what will be scanned. In cases
where that's not practical we expend more effort but still prune with
maximal effectiveness.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2017-09-26 14:33:54 Re: Built-in plugin for logical decoding output
Previous Message Magnus Hagander 2017-09-26 14:14:18 Re: Built-in plugin for logical decoding output