Re: Parallel Index Scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-02-10 18:24:53
Message-ID: CA+TgmobZCELGShzM2T-0VGu-DQWYoydNzY=QowkJjftjENP1Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 9, 2017 at 1:33 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> compute_index_pages_v2.patch - This function extracts the computation
> of index pages to be scanned in a separate function and used it in
> existing code. You will notice that I have pulled up the logic of
> conversion of clauses to indexquals from create_index_path to
> build_index_paths as that is required to compute the number of index
> and heap pages to be scanned by scan in patch
> parallel_index_opt_exec_support_v8.patch. This doesn't impact any
> existing functionality.

This design presupposes that every AM that might ever want to do
parallel scans is happy with genericcostestimate()'s method of
computing the number of index pages that will be fetched. However,
that might not be true for every possible AM. In fact, it's already
not true for BRIN, which always reads the whole index. Now, since
we're only concerned with btree at the moment, nothing would be
immediately broken by this approach but it seems we're just setting
ourselves up for future pain.

I have what I think might be a better idea: let's make
amcostestimate() responsible for returning a suggested parallel degree
for the path via an additional out parameter. cost_index() can then
reduce that value if it seems like not enough heap pages will be
fetched to justify the return value, or it can override it completely
if parallel_degree is set for the relation. Then we don't need to run
this logic twice to compute the same value, and we don't violate the
AM abstraction layer.

BTW, the changes to add_partial_path() aren't needed, because an
IndexPath only gets reused if you stick a Bitmap{Heap,And,Or}Path on
top of it, and that won't be the case with this or any other pending
patch. If we get the ability to have a Parallel Bitmap Heap Scan that
takes a parallel index scan rather than a standard index scan as
input, then we'll need something like this. But for now it's probably
best to just update the comments and remove the Assert().

I think you can also leave out the documentation changes from these
patches. I'll do some general rewriting of the parallel query section
once we know exactly what capabilities we'll have in this release; I
think that will work better than trying to update them a bit at a time
for each patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-10 18:26:02 Re: removing tsearch2
Previous Message Andres Freund 2017-02-10 18:22:34 Re: WIP: About CMake v2