Re: Parallel Index Scans

From: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Index Scans
Date: 2016-12-12 04:05:32
Message-ID: CAOGQiiOneen9WEppO6V_myKpQ97CrjBQJ0Pv7ri0rxmMYvLcTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
On evaluating parallel index scans on TPC-H benchmark queries, I came
across some interesting results.

For scale factor 20, queries 4, 6 and 14 are giving significant performance
improvements with parallel index:
Q | Head | PI
4 | 14 | 11
6 | 27 | 9
14 | 20 | 12

To confirm that the proposed patch is scalable I tested it on 300 scale
factor, there some queries switched to bitmap index scan instead of
parallel index, but there were other queries giving significant improvement
in performance:
Q | Head | PI
4 | 207 | 168
14 | 2662 | 1576
15 | 847 | 190

All the performance numbers given above are in seconds. The experimental
setup used in this exercise is as follows:
Server parameter settings:
work_mem = 64 MB,
max_parallel_workers_per_gather = 4,
random_page_cost = seq_page_cost = 0.1 = parallel_tuple_cost,
shared_buffers = 1 GB

Logical schema: Some additional indexes were created to ensure the use of
indexes,
on lineitem table -- l_shipdate, l_returnflag, l_shipmode,
on orders table -- o_comment, o_orderdate, and
on customer table -- c_mktsegment.

Machine used: IBM Power, 4 socket machine, 512 GB RAM

Main observations about the utility and power of this patch includes
availability of appropriate indexes, giving suitable value of
random_page_cost based on the RAM and DB sizes. E.g. in these
experimentation I ensured warm cache environment, hence giving a higher
value to random_page_cost than seq_page_cost does not makes much sense and
it would inhibit the use of indexes. Also, the value of this parameter
needs to be calibrated based on the underlying hardware, there is a recent
work in this direction that gives a mechanism to do this calibration
offline, also they experimented with Postgresql parameters [1].

Please find the attached file for have a look on these results in detail.
The file pi_perf_tpch.ods gives the performance numbers and the graphs for
both the scale factors. Attached zip folder gives the explain analyse
output for these queries on both head as well as with parallel index patch.

[1] http://pages.cs.wisc.edu/~wentaowu/papers/prediction-full.pdf

On Mon, Dec 5, 2016 at 10:36 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:

>
>
> On Sat, Nov 26, 2016 at 10:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
>> On Sat, Oct 22, 2016 at 9:07 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Fri, Oct 21, 2016 at 10:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> wrote:
>>
>> I have rebased the patch (parallel_index_scan_v2) based on latest
>> commit e8ac886c (condition variables). I have removed the usage of
>> ConditionVariablePrepareToSleep as that is is no longer mandatory. I
>> have also updated docs for wait event introduced by this patch (thanks
>> to Dilip for noticing it). There is no change in
>> parallel_index_opt_exec_support patch, but just attaching here for
>> easier reference.
>>
>>
> Moved to next CF with "needs review" status.
>
> Regards,
> Hari Babu
> Fujitsu Australia
>

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/

Attachment Content-Type Size
PI_perf_tpch.ods application/vnd.oasis.opendocument.spreadsheet 40.8 KB
PI_plans.zip application/zip 44.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-12-12 04:05:36 Re: PATCH: two slab-like memory allocators
Previous Message Michael Paquier 2016-12-12 03:58:35 Re: jacana hung after failing to acquire random number