Re: Parallel Index Scans

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-01-31 12:23:50
Message-ID: CAH2L28u-HkqXXr8bWMWQQ4TFma3azyfXZUio1yjgcbNMJrXVvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

>Agreed, that it makes sense to consider only the number of pages to
>scan for computation of parallel workers. I think for index scan we
>should consider both index and heap pages that need to be scanned
>(costing of index scan consider both index and heap pages). I thin
>where considering heap pages matter more is when the finally selected
>rows are scattered across heap pages or we need to apply a filter on
>rows after fetching from the heap. OTOH, we can consider just pages
>in the index as that is where mainly the parallelism works
IMO, considering just index pages will give a better estimate of work to be
done
in parallel. As the amount of work/number of pages divided amongst workers
is irrespective of
the number of heap pages scanned.

Thank you,
Rahila Syed

On Mon, Jan 30, 2017 at 2:52 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Sat, Jan 28, 2017 at 1:34 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >> parallel_index_opt_exec_support_v6 - Removed the usage of
> >> GatherSupportsBackwardScan. Expanded the comments in
> >> ExecReScanIndexScan.
> >
> > I looked through this and in general it looks reasonable to me.
> > However, I did notice one thing that I think is wrong. In the
> > parallel bitmap heap scan patch, the second argument to
> > compute_parallel_worker() is the number of pages that the parallel
> > scan is expected to fetch from the heap. In this patch, it's the
> > total number of pages in the index. The former seems to me to be
> > better, because the point of having a threshold relation size for
> > parallelism is that we don't want to use a lot of workers to scan a
> > small number of pages -- the distribution of work won't be even, and
> > the potential savings are limited. If we've got a big index but are
> > using a very selective qual to pull out only one or a small number of
> > rows on a single page or a small handful of pages, we shouldn't
> > generate a parallel path for that.
> >
>
> Agreed, that it makes sense to consider only the number of pages to
> scan for computation of parallel workers. I think for index scan we
> should consider both index and heap pages that need to be scanned
> (costing of index scan consider both index and heap pages). I thin
> where considering heap pages matter more is when the finally selected
> rows are scattered across heap pages or we need to apply a filter on
> rows after fetching from the heap. OTOH, we can consider just pages
> in the index as that is where mainly the parallelism works. In the
> attached patch (parallel_index_opt_exec_support_v7.patch), I have
> considered both index and heap pages, let me know if you think some
> other way is better. I have also prepared a separate independent
> patch (compute_index_pages_v1) on HEAD to compute index pages which
> can be used by parallel index scan. There is no change in
> parallel_index_scan (parallel btree scan) patch, so I am not attaching
> its new version.
>
> > Now, against that theory, the GUC that controls the behavior of
> > compute_parallel_worker() is called min_parallel_relation_size, which
> > might make you think that the decision is supposed to be based on the
> > whole size of some relation. But I think that just means we need to
> > rename the GUC to something like min_parallel_scan_size. Possibly we
> > also ought consider reducing the default value somewhat, because it
> > seems like both sequential and index scans can benefit even when
> > scanning less than 8MB.
> >
>
> Agreed, but let's consider it separately.
>
>
> The order in which patches needs to be applied:
> compute_index_pages_v1.patch, parallel_index_scan_v6.patch[1],
> parallel_index_opt_exec_support_v7.patch
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1J%
> 3DLSBpDx7i_izGJxGVUryqPe-2SKT02De-PrQvywiMxw%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2017-01-31 12:39:53 Re: macaddr 64 bit (EUI-64) datatype support
Previous Message Daniel Verite 2017-01-31 12:21:49 Re: One-shot expanded output in psql using \G