Re: [HACKERS] Secondary index access optimizations

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Secondary index access optimizations
Date: 2018-03-22 09:38:33
Message-ID: 9d3154bd-5e2d-0e0a-4975-ec76451bae90@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 21.03.2018 20:30, Konstantin Knizhnik wrote:
>
>
> On 01.03.2018 23:15, Andres Freund wrote:
>> Hi,
>>
>>
>> This patch seems like quite a good improvement. One thing I've not
>> really looked at but am slightly concerned in passing: Are there cases
>> where we now would do a lot of predicate pruning work even though the
>> overhead of just evaluating the qual is trivial, e.g. because there's
>> only one row due to a pkey? The predtest code is many things but
>> lightning fast is not one of them. Obviously that won't matter for
>> analytics queries, but I could see it hurt in OLTPish workloads...
>>
>> Greetings,
>>
>> Andres Freund
>
> Hi,
> I am sorry for delay with answer.
>
> I understand your concern and did the following experiment.
> I have initialized the database in the following way:
>
> create table base (k integer primary key, v integer);
> create table part1 (check (k between 1 and 10000)) inherits (base);
> create table part2 (check (k between 10001 and 20000)) inherits (base);
> create index pi1 on part1(v);
> create index pi2 on part2(v);
> insert into part1 values (generate series(1,10000), random()*100000);
> insert into part2 values (generate_series(10001,20000), random()*100000);
> vacuum analyze part1;
> vacuum analyze part2;
>
> Vanilla Postgres uses the following plan:
>
> explain select * from base where k between 1 and 20000 and v = 100;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Append  (cost=0.00..16.63 rows=3 width=8)
>    ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)
>          Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
>    ->  Index Scan using pi1 on part1  (cost=0.29..8.31 rows=1 width=8)
>          Index Cond: (v = 100)
>          Filter: ((k >= 1) AND (k <= 20000))
>    ->  Index Scan using pi2 on part2  (cost=0.29..8.31 rows=1 width=8)
>          Index Cond: (v = 100)
>          Filter: ((k >= 1) AND (k <= 20000))
> (9 rows)
>
> Execution of this query 100000 times gives is done in 12 seconds.
> With applied patch query plan is changed to:
>
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Append  (cost=0.00..16.62 rows=3 width=8)
>    ->  Seq Scan on base  (cost=0.00..0.00 rows=1 width=8)
>          Filter: ((k >= 1) AND (k <= 20000) AND (v = 100))
>    ->  Index Scan using pi1 on part1  (cost=0.29..8.30 rows=1 width=8)
>          Index Cond: (v = 100)
>    ->  Index Scan using pi2 on part2  (cost=0.29..8.30 rows=1 width=8)
>          Index Cond: (v = 100)
> (7 rows)
>
> Elapsed time of 100000 query executions is 13 seconds.
> So you was right that increased query optimization time exceeds
> advantage of extra checks elimination.
> But it is true only if we are not using prepare statements.
> With prepared statements results are the following:
>
> Vanilla:       0m3.915s
> This patch: 0m3.563s
>
> So improvement is not so large, but it exists.
> If somebody wants to repeat my experiments, I attached to this mail
> small shell script which I used to run query several times.
> Non-prepared query is launched using the following command:
>
> time ./repeat-query.sh "select * from base where k between 1 and 20000
> and v = 100" 100000
>
> and prepared query:
>
> time ./repeat-query.sh "execute select100" 100000 "prepare select100
> as select * from base where k between 1 and 20000 and v = 100"
>
> And once again I want to notice that using prepared statements can
> increase performance almost 3 times!
> As far as using prepared statements is not always possible I want to
> recall autoprepare patch waiting at the commitfest:
>
> https://www.postgresql.org/message-id/flat/8eed9c23-19ba-5404-7a9e-0584b836b3f3%40postgrespro(dot)ru#8eed9c23-19ba-5404-7a9e-0584b836b3f3(at)postgrespro(dot)ru
>
> --
> Konstantin Knizhnik
> Postgres Professional:http://www.postgrespro.com
> The Russian Postgres Company
Attached please find rebased version of the patch.
Also I do more testing, now using pgbench.
Scripts for initialization of the database and for custom script for
pgbench are attached.
Results at my computer are the following:

pgbench options
Vanilla
Patch
-c 1
9208
8289
-c 1 -M prepared
38503 41206
-c 10
39224
34040
-c 10 -M prepared
165465
172874

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
optimizer-11.patch text/x-patch 49.8 KB
select.pgbench text/plain 83 bytes
init.sql application/sql 440 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2018-03-22 09:56:11 Re: MCV lists for highly skewed distributions
Previous Message Fabien COELHO 2018-03-22 09:32:13 Re: pg_stat_statements HLD for futur developments