Re: [HACKERS] Secondary index access optimizations

From: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Secondary index access optimizations
Date: 2017-12-04 17:44:31
Message-ID: CAE3TBxzTwF0GkC6_44F-LfdfV340-NSiTuW9vyHQs=j4CrYB8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 5, 2017 at 9:10 AM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

>
>
> On 05.09.2017 04:02, Amit Langote wrote:
>
> Like Thomas, I'm not so sure about the whole predtest.c patch. The core
> logic in operator_predicate_proof() should be able to conclude that, say,
> k < 21 is implied by k <= 20, which you are trying to address with some
> special case code. If there is still problem you think need to be fixed
> here, a better place to look at would be somewhere around get_btree_test_op().
>
>
> Frankly speaking I also do not like this part of my patch.
> I will be pleased if you or somebody else can propose better solution.
> I do not understand how get_btree_test_op() can help here.
>
> Yes, k < 21 is implied by k <= 20. It is based on generic properties of <
> and <= operators.
> But I need to proof something different: having table partition constraint
> (k < 21) I want to remove predicate (k <= 20) from query.
> In other words, operator_predicate_proof() should be able to conclude
> that (k <= 20) is implied by (k < 21).
> But it is true only for integer types, not for floating point types. And
> Postgres operator definition
> doesn't provide some way to determine that user defined type is integer
> type: has integer values for which such conclusion is true.
>
> Why I think that it is important? Certainly, it is possible to rewrite
> query as (k < 21) and no changes in operator_predicate_proof() are needed.
> Assume the most natural use case: I have some positive integer key and I
> wan to range partition table by such key, for example with interval 10000.
> Currently standard PostgreSQL partitioning mechanism requires to specify
> intervals with open high boundary.
> So if I want first partition to contain interval [1,10000], second -
> [10001,20001],... I have to create partitions in such way:
>
> create table bt (k integer, v integer) partition by range (k);
> create table dt1 partition of bt for values from (1) to (10001);
> create table dt2 partition of bt for values from (10001) to (20001);
> ...
>
> If I want to write query inspecting data of the particular partition, then
> most likely I will use BETWEEN operator:
>
> SELECT * FROM t WHERE k BETWEEN 1 and 10000;
>
> But right now operator_predicate_proof() is not able to conclude that
> predicate (k BETWEEN 1 and 10000) transformed to (k >= 1 AND k <= 10000) is
> equivalent to (k >= 1 AND k < 10001)
> which is used as partition constraint.
>
> Another very popular use case (for example mentioned in PostgreSQL
> documentation of partitioning: https://www.postgresql.org/
> docs/10/static/ddl-partitioning.html)
> is using date as partition key:
>
> CREATE TABLE measurement (
> city_id int not null,
> logdate date not null,
> peaktemp int,
> unitsales int
> ) PARTITION BY RANGE (logdate);
>
>
> CREATE TABLE measurement_y2006m03 PARTITION OF measurement
> FOR VALUES FROM ('2006-03-01') TO ('2006-04-01')
>
>
> Assume that now I want to get measurements for March:
>
> There are three ways to write this query:
>
> select * from measurement where extract(month from logdate) = 3;
> select * from measurement where logdate between '2006-03-01' AND
> '2006-03-31';
> select * from measurement where logdate >= '2006-03-01' AND logdate <
> '2006-04-01';
>
> Right now only for the last query optimal query plan will be constructed.
>

Perhaps the relative pages (about partitioning and optimization) should
mention to avoid BETWEEN and using closed-open checks, as the last query.

Dates are a perfect example to demonstrate that BETWEEN shouldn't be used,
in my opinion. Dates (and timestamps) are not like integers as they are
often used with different levels of precisions, day, month, year, hour,
minute, second, etc. (month in your example). Constructing the correct
expressions for the different precisions can be a nightmare with BETWEEN
but very simple with >= and < (in the example: get start_date,
'2006-03-01', and add one month).

So, just my 2c, is it worth the trouble to implement this feature
(conversion of (k<21) to (k<=20) and vice versa) and how much work would it
need for all data types that are commonly used for partitioning?

> Unfortunately my patch is not covering date type.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-04 17:47:40 Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com
Previous Message Robert Haas 2017-12-04 17:42:24 Re: Errands around AllocateDir()