Few remarks on JIT , parallel query execution and columnar store...

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Few remarks on JIT , parallel query execution and columnar store...
Date: 2018-10-11 10:04:48
Message-ID: f57e183c-3cfe-fdd5-7be2-8e1456711067@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Recently I have to estimate performance of performing select with
multiple search conditions with bad selectivity.
Definitely it is some kind of OLAP query and it will be interesting for
me to understand the role of different PostgreSQL optimization options.

So the table is the following:

create table t(pk bigint primary key, val1 double, val2 double, val3
double);
insert into t select s."#1" as pk, rnd() as val1, rnd() as val2, rnd()
as val3 from generate_series(1,10000000) s;

I run the following query on standard desktop with quad-core CPU and 16
Gb of RAM with shared buffer adjusted to fit the whole database.:

select count(*) from t where val1>=0.5 and val2<=0.5 and val3 between
0.2 and 0.6;

Results are the following:

JIT
Parallel workers
Time
off
0
773
off
8
216
on
0
650
on
8
254

So without parallelism JIT provides some speed improvement, but in case
of parallel execution JIT effect is negative.
Most likely because JIT generation time (30 msec) is comparable with
execution time.

Conclusion: for sequential scan of 10 million records JIT is not able to
provide performance improvement.
Let's increase number of records 10 times.
Now results are the following:

JIT
Parallel workers
Time
off
0
7848
off
8
2063
on
0
6301
on
8
1648

So now JIT is faster both for sequential and parallel execution.
But is it not the fastest result of processing this query with Postgres.
Let's try my extension VOPS (https://github.com/postgrespro/vops):

Parallel workers Time
0
1447
2 623
4
494
8
491

So VOPS is > 3 times faster than JIT,  but looks like it can provide
even better results for larger number of records,
because as you see increasing  number of workers from 2 to 4 cause
increase of performance about 30% and not two times.
Looks like overhead of starting parallel worker is too large and for
query execution time < 1 second it has noticeable impact on total
performance.

In some other my prototype DBMS with vertical data representation and
multhreaded execution time of execution of this query is 195 msec.
So there is still scope for improvements:)

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-10-11 13:53:45 Re: Debian mips: Failed test 'Check expected t_009_tbl data on standby'
Previous Message Corey Huinker 2018-10-11 09:12:48 Re: COPY FROM WHEN condition