Re: Parallel Index Scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2016-12-27 14:33:26
Message-ID: CAA4eK1KthrAvNjmB2cWuUHz+p3ZTTtbD7o2KUw49PC8HK4r1Wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 23, 2016 at 6:42 PM, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> 22.12.2016 07:19, Amit Kapila:
>>
>> On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
>> <lubennikovaav(at)gmail(dot)com> wrote:
>>>
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world: tested, passed
>>> Implements feature: tested, passed
>>> Spec compliant: tested, passed
>>> Documentation: tested, passed
>>>
>>> Hi, thank you for the patch.
>>> Results are very promising. Do you see any drawbacks of this feature or
>>> something that requires more testing?
>>>
>> I think you can focus on the handling of array scan keys for testing.
>> In general, one of my colleagues has shown interest in testing this
>> patch and I think he has tested as well but never posted his findings.
>> I will request him to share his findings and what kind of tests he has
>> done, if any.
>
>
>
> Check please code related to buffer locking and pinning once again.
> I got the warning. Here are the steps to reproduce it:
> Except "autovacuum = off" config is default.
>
> pgbench -i -s 100 test
> pgbench -c 10 -T 120 test
>
> SELECT count(aid) FROM pgbench_accounts
> WHERE aid > 1000 AND aid < 900000 AND bid > 800 AND bid < 900;
> WARNING: buffer refcount leak: [8297] (rel=base/12289/16459, blockNum=2469,
> flags=0x93800000, refcount=1 1)
> count
>

The similar problem has occurred while testing "parallel index only
scan" patch and Rafia has included the fix in her patch [1] which
ideally should be included in this patch, so I have copied the fix
from her patch. Apart from that, I observed that similar problem can
happen for backward scans, so fixed the same as well.

>
>>> 2. Don't you mind to rename 'amestimateparallelscan' to let's say
>>> 'amparallelscan_spacerequired'
>>> or something like this?
>>
>> Sure, I am open to other names, but IMHO, lets keep "estimate" in the
>> name to keep it consistent with other parallel stuff. Refer
>> execParallel.c to see how widely this word is used.
>>
>>> As far as I understand there is nothing to estimate, we know this size
>>> for sure. I guess that you've chosen this name because of
>>> 'heap_parallelscan_estimate'.
>>> But now it looks similar to 'amestimate' which refers to indexscan cost
>>> for optimizer.
>>> That leads to the next question.
>>>
>> Do you mean 'amcostestimate'? If you want we can rename it
>> amparallelscanestimate to be consistent with amcostestimate.
>
>
> I think that 'amparallelscanestimate' seems less ambiguous than
> amestimateparallelscan.
> But it's up to you. There are enough comments to understand the purpose of
> this field.
>

Okay, then lets leave as it is, because we have aminitparallelscan
which should also be renamed to amparallelscaninit if we rename
amestimateparallelscan.

>>
>>> And why do you call it pageStatus? What does it have to do with page?
>>>
>> During scan this tells us whether next page is available for scan.
>> Another option could be to name it as scanStatus, but not sure if that
>> is better. Do you think if we add a comment like "indicates whether
>> next page is available for scan" for this variable then it will be
>> clear?
>
>
> Yes, I think it describes the flag better.

Changed as per above suggestion.

>>>
>>> 5. Comment for _bt_parallel_seize() says:
>>> "False indicates that we have reached the end of scan for
>>> current scankeys and for that we return block number as P_NONE."
>>>
>>> What is the reason to check (blkno == P_NONE) after checking (status ==
>>> false)
>>> in _bt_first() (see code below)? If comment is correct
>>> we'll never reach _bt_parallel_done()
>>>
>>> + blkno = _bt_parallel_seize(scan, &status);
>>> + if (status == false)
>>> + {
>>> + BTScanPosInvalidate(so->currPos);
>>> + return false;
>>> + }
>>> + else if (blkno == P_NONE)
>>> + {
>>> + _bt_parallel_done(scan);
>>> + BTScanPosInvalidate(so->currPos);
>>> + return false;
>>> + }
>>>
>> The first time master backend or worker hits last page (calls this
>> API), it will return P_NONE and after that any worker tries to fetch
>> next page, it will return status as false. I think we can expand a
>> comment to explain it clearly. Let me know, if you need more
>> clarification, I can explain it in detail.
>>
> Got it,
> I think you can add this explanation to the comment for
> _bt_parallel_seize().
>

Expanded the comment as discussed.

>>> 6. To avoid code duplication, I would wrap this into the function
>>>
>>> + /* initialize moreLeft/moreRight appropriately for scan direction
>>> */
>>> + if (ScanDirectionIsForward(dir))
>>> + {
>>> + so->currPos.moreLeft = false;
>>> + so->currPos.moreRight = true;
>>> + }
>>> + else
>>> + {
>>> + so->currPos.moreLeft = true;
>>> + so->currPos.moreRight = false;
>>> + }
>>> + so->numKilled = 0; /* just paranoia */
>>> + so->markItemIndex = -1; /* ditto */
>>>
>> Okay, I think we can write a separate function (probably inline
>> function) for above.
>>

Added the above code in a separate inline function.

>>> And after that we can also get rid of _bt_parallel_readpage() which only
>>> bring another level of indirection to the code.
>>>
>> See, this function is responsible for multiple actions like
>> initializing moreLeft/moreRight positions, reading next page, dropping
>> the lock/pin. So replicating all these actions in the caller will
>> make the code in caller less readable as compared to now. Consider
>> this point and let me know your view on same.
>
>
> Thank you for clarification, now I agree with your implementation.
> I've just missed that we also handle lock in this function.
>

Okay.

> 7. Just a couple of typos I've noticed:
>
> * Below flags are used indicate the state of parallel scan.
> * Below flags are used TO indicate the state of parallel scan.
>
> * On success, release lock and pin on buffer on success.
> * On success release lock and pin on buffer.
>

Fixed.

> 8. I didn't find a description of the feature in documentation.
> Probably we need to add a paragraph to the "Parallel Query" chapter.
>

Added the description in parallel_index_opt_exec_support_v3.patch.

>
> Performance results with 2 parallel workers are about 1.5-3 times better,
> just like in your tests.
> So, no doubt, this feature will be useful.

Thanks for the tests.

> But I'm trying to find the worst cases for this feature. And I suppose we
> should test parallel index scans with
> concurrent insertions. More parallel readers we have, higher the
> concurrency.
> I doubt that it can significantly decrease performance, because number of
> parallel readers is not that big,
>

I am not sure if such a test is meaningful for this patch because
parallelism is generally used for large data reads and in such cases
there are generally not many concurrent writes.

Thanks for your valuable inputs.

[1] - https://www.postgresql.org/message-id/CAOGQiiNx4Ra9A-RyxjrgECownmVJ64EVpVgfN8ACR-MLupGnng%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_index_scan_v3.patch application/octet-stream 50.1 KB
parallel_index_opt_exec_support_v3.patch application/octet-stream 27.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-12-27 14:40:39 Re: Parallel Index Scans
Previous Message Stephen Frost 2016-12-27 14:26:21 Re: Reporting planning time with EXPLAIN