From: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> |
---|---|
To: | Antonin Houska <ah(at)cybertec(dot)at> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [HACKERS] Secondary index access optimizations |
Date: | 2018-01-10 15:07:38 |
Message-ID: | 252ce46f-6b32-66b8-5f43-4af9fb70c349@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 09.01.2018 19:48, Antonin Houska wrote:
> Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
>> On 14.08.2017 19:33, Konstantin Knizhnik wrote:
>>
>> On 14.08.2017 12:37, Konstantin Knizhnik wrote:
>>
>> Hi hackers,
>>
>> I am trying to compare different ways of optimizing work with huge append-only tables in PostgreSQL where primary key is something like timestamp and queries are usually accessing most recent data using some secondary keys. Size of secondary index is one of the most critical
>> factors limiting insert/search performance. As far as data is inserted in timestamp ascending order, access to primary key is well localized and accessed tables are present in memory. But if we create secondary key for the whole table, then access to it will require random reads from
>> the disk and significantly decrease performance.
>>
>> There are two well known solutions of the problem:
>> 1. Table partitioning
>> 2. Partial indexes
>>
>> This approaches I want to compare. First of all I want to check if optimizer is able to generate efficient query execution plan covering different time intervals.
>> Unfortunately in both cases generated plan is not optimal.
>>
>> 1. Table partitioning:
>>
>> create table base (k integer primary key, v integer);
>> create table part1 (check (k between 1 and 10000)) inherits (base);
>> create table part2 (check (k between 10001 and 20000)) inherits (base);
>> create index pi1 on part1(v);
>> create index pi2 on part2(v);
>> insert int part1 values (generate series(1,10000), random());
>> insert into part2 values (generate_series(10001,20000), random());
>> explain select * from base where k between 1 and 20000 and v = 100;
>> QUERY PLAN
>> -----------------------------------------------------------------------
>> Append (cost=0.00..15.65 rows=3 width=8)
>> -> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
>> Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
>> -> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8)
>> Index Cond: (v = 100)
>> Filter: ((k >= 1) AND (k <= 20000))
>> -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
>> Index Cond: (v = 100)
>> Filter: ((k >= 1) AND (k <= 20000))
>>
>> Questions:
>> - Is there some way to avoid sequential scan of parent table? Yes, it is empty and so sequential scan will not take much time, but ... it still requires some additional actions and so increasing query execution time.
>> - Why index scan of partition indexes includes filter condition if it is guaranteed by check constraint that all records of this partition match search predicate?
>>
>> 2. Partial indexes:
>>
>> create table t (k integer primary key, v integer);
>> insert into t values (generate_series(1,20000),random());
>> create index i1 on t(v) where k between 1 and 10000;
>> create index i2 on t(v) where k between 10001 and 20000;
>> postgres=# explain select * from t where k between 1 and 10000 and v = 100;
>> QUERY PLAN
>> ------------------------------------------------------------
>> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
>> Index Cond: (v = 100)
>> (2 rows)
>>
>> Here we get perfect plan. Let's try to extend search interval:
>>
>> postgres=# explain select * from t where k between 1 and 20000 and v = 100;
>> QUERY PLAN
>> ------------------------------------------------------------------
>> Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8)
>> Index Cond: ((k >= 1) AND (k <= 20000))
>> Filter: (v = 100)
>> (3 rows)
>>
>> Unfortunately in this case Postgres is not able to apply partial indexes.
>> And this is what I expected to get:
>>
>> postgres=# explain select * from t where k between 1 and 10000 and v = 100 union all select * from t where k between 10001 and 20000 and v = 100;
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Append (cost=0.29..14.58 rows=2 width=8)
>> -> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8)
>> Index Cond: (v = 100)
>> -> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8)
>> Index Cond: (v = 100)
>>
>> I wonder if there are some principle problems in supporting this two things in optimizer:
>> 1. Remove search condition for primary key if it is fully satisfied by derived table check constraint.
>> 2. Append index scans of several partial indexes if specified interval is covered by their conditions.
>>
>> I wonder if someone is familiar with this part of optimizer ad can easily fix it.
>> Otherwise I am going to spend some time on solving this problems (if community think that such optimizations will be useful).
>>
>> Replying to myself: the following small patch removes redundant checks from index scans for derived tables:
>>
>> diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
>> index 939045d..1f7c9cf 100644
>> --- a/src/backend/optimizer/util/plancat.c
>> +++ b/src/backend/optimizer/util/plancat.c
>> @@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root,
>> if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo, false))
>> return true;
>>
>> + /*
>> + * Remove from restriction list items implied by table constraints
>> + */
>> + safe_restrictions = NULL;
>> + foreach(lc, rel->baserestrictinfo)
>> + {
>> + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
>> + if (!predicate_implied_by(list_make1(rinfo->clause), safe_constraints, false)) {
>> + safe_restrictions = lappend(safe_restrictions, rinfo);
>> + }
>> + }
>> + rel->baserestrictinfo = safe_restrictions;
>> +
>> +
>> return false;
>> }
>>
>> =================================================
>>
>> I am not sure if this is the right place of adjusting restriction list and is such transformation always correct.
>> But for the queries I have tested produced plans are correct:
>>
>> postgres=# explain select * from base where k between 1 and 20000 and v = 100;
>> QUERY PLAN
>> -----------------------------------------------------------------------
>> Append (cost=0.00..15.64 rows=3 width=8)
>> -> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
>> Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
>> -> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
>> Index Cond: (v = 100)
>> -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
>> Index Cond: (v = 100)
>> (7 rows)
>>
>> postgres=# explain select * from base where k between 1 and 15000 and v = 100;
>> QUERY PLAN
>> -----------------------------------------------------------------------
>> Append (cost=0.00..15.64 rows=3 width=8)
>> -> Seq Scan on base (cost=0.00..0.00 rows=1 width=8)
>> Filter: ((k >= 1) AND (k <= 15000) AND (v = 100))
>> -> Index Scan using pi1 on part1 (cost=0.29..8.30 rows=1 width=8)
>> Index Cond: (v = 100)
>> -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8)
>> Index Cond: (v = 100)
>> Filter: (k <= 15000)
>> (8 rows)
>>
>> There is one more problem with predicate_implied_by function and standard Postgres partitions.
>> Right now it specifies constrains for partitions using intervals with open high boundary:
>>
>> postgres=# create table bt (k integer, v integer) partition by range (k);
>> postgres=# create table dt1 partition of bt for values from (1) to (10001);
>> postgres=# create table dt2 partition of bt for values from (10001) to (20001);
>> postgres=# create index dti1 on dt1(v);
>> postgres=# create index dti2 on dt2(v);
>> postgres=# insert into bt values (generate_series(1,20000), random());
>>
>> postgres=# \d+ dt2
>> Table "public.dt2"
>> Column | Type | Collation | Nullable | Default | Storage | Stats target | De
>> scription
>> --------+---------+-----------+----------+---------+---------+--------------+---
>> ----------
>> k | integer | | | | plain | |
>> v | integer | | | | plain | |
>> Partition of: bt FOR VALUES FROM (10001) TO (20001)
>> Partition constraint: ((k IS NOT NULL) AND (k >= 10001) AND (k < 20001))
>> Indexes:
>> "dti2" btree (v)
>>
>> If now I run the query with predicate "between 1 and 20000", I still see extra check in the plan:
>>
>> postgres=# explain select * from bt where k between 1 and 20000 and v = 100;
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Append (cost=0.29..15.63 rows=2 width=8)
>> -> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
>> Index Cond: (v = 100)
>> -> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
>> Index Cond: (v = 100)
>> Filter: (k <= 20000)
>> (6 rows)
>>
>> It is because operator_predicate_proof is not able to understand that k < 20001 and k <= 20000 are equivalent for integer type.
>> If I rewrite query as (k >= 1 and k < 20001) then plan is optimal:
>>
>> postgres=# explain select * from bt where k >= 1 and k < 20001 and v = 100;
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Append (cost=0.29..15.63 rows=2 width=8)
>> -> Index Scan using dti1 on dt1 (cost=0.29..8.30 rows=1 width=8)
>> Index Cond: (v = 100)
>> -> Index Scan using dti2 on dt2 (cost=0.29..7.33 rows=1 width=8)
>> Index Cond: (v = 100)
>> (5 rows)
>>
>> It is fixed by one more patch of predtest.c:
> Have you considered using the range types (internally in
> operator_predicate_proof()) instead of hard-coding operator OIDs? The range
> types do have the knowledge that k < 20001 and k <= 20000 are equivalent for
> integer type:
>
> postgres=# SELECT int4range '(, 20001)' = int4range '(, 20000]';
> ?column?
> ----------
> t
> (1 row)
>
>
It is bright idea, but it is not quit clear to me how to implement it.
There are several builtin ranges types in Postgres: |int4range||,
int8range, |||numrange, |||tsrange, |||tstzrange, |||||daterange.
|||Among them int4range, int8range and daterange are discrete types
having canonical function, for which this transformation rules are
applicable.
Now I perform checks for all this types. So the challenge is to support
user defined range types with canonical function.
As input operator_predicate_proof function has Oid of comparison
operator and Const * expression representing literal value.
So I see the following generic way of checking equivalence of ranges:
1. Get name of operator. If it is '<=' or '>=' then it is closed
interval, if it is '<' or '>' then it is open interval.
2. Convert Const to text (using type's out function) and construct
interval: '(,"$value"]' for '<=', '["$value",)' for '>=', '(,"$value")'
for '<' and '("$value",)' for '>'.
3. Find range type from type of the constant:
select * from pg_range where rngsubtype=?;
4. Try to cast constructed above string to this range type (using type's
in function).
5. Compare two produced ranges and if them are equal, then
operator_predicate_proof should return true.
But I am not sure that this steps will correctly work for all possible
user defined discrete range types:
1. Is it correct to assume that (col <= XXX) corresponds to '(,XXX]'
interval?
Is there some better way to determine type of interval based on
operator rather than looking at operator's name?
2. Should each range type represent this interval format?
For builtin range types is is true, but is there any warranty that
arbitrary user defined range type will recognize string '(,"XXX"]' ?
3. Is it always possible to find correspondent range type for the
specified literal type?
For example, for int2 type there is no range type.
4. Range type input function may throw an error, so we have to catch and
ignore it.
So, I completely agree with your that using hardcoded operator/type
IDs is bad smell.
But it is not clear if it is possible to perform this check in general way.
At least I have doubts about my approach explained above...
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2018-01-10 15:17:20 | Re: CUBE seems a bit confused about ORDER BY |
Previous Message | Amit Khandekar | 2018-01-10 15:00:05 | Re: [HACKERS] UPDATE of partition key |