Re: Parallel Index Scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2016-12-22 04:19:20
Message-ID: CAA4eK1+gCK7Rz86J+c0i57Z0KT=zpoL0sew43sHpUkB+Prnfvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 21, 2016 at 8:46 PM, Anastasia Lubennikova
<lubennikovaav(at)gmail(dot)com> wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: tested, passed
> Documentation: tested, passed
>
> Hi, thank you for the patch.
> Results are very promising. Do you see any drawbacks of this feature or something that requires more testing?
>

I think you can focus on the handling of array scan keys for testing.
In general, one of my colleagues has shown interest in testing this
patch and I think he has tested as well but never posted his findings.
I will request him to share his findings and what kind of tests he has
done, if any.

> I'm willing to oo a review.

Thanks, that will be helpful.

> I saw the discussion about parameters in the thread above. And I agree that we'd better concentrate
> on the patch itself and add them later if necessary.
>
> 1. Can't we simply use "if (scan->parallel_scan != NULL)" instead of xs_temp_snap flag?
>
> + if (scan->xs_temp_snap)
> + UnregisterSnapshot(scan->xs_snapshot);
>

I agree with what Rober has told in his reply. We do same way for
heap, refer heap_endscan().

> I must say that I'm quite new with all this parallel stuff. If you give me a link,
> where to read about snapshots for parallel workers, my review will be more helpful.
>

You can read transam/README.parallel. Refer "State Sharing" portion
of README to learn more about it.

> Anyway, it would be great to have more comments about it in the code.
>

We are sharing snapshot to ensure that reads in both master backend
and worker backend can use the same snapshot. There is no harm in
adding comments, but I think it is better to be consistent with
similar heapam code. After reading README.parallel, if you still feel
that we should add more comments in the code, then we can definitely
do that.

> 2. Don't you mind to rename 'amestimateparallelscan' to let's say 'amparallelscan_spacerequired'
> or something like this?

Sure, I am open to other names, but IMHO, lets keep "estimate" in the
name to keep it consistent with other parallel stuff. Refer
execParallel.c to see how widely this word is used.

> As far as I understand there is nothing to estimate, we know this size
> for sure. I guess that you've chosen this name because of 'heap_parallelscan_estimate'.
> But now it looks similar to 'amestimate' which refers to indexscan cost for optimizer.
> That leads to the next question.
>

Do you mean 'amcostestimate'? If you want we can rename it
amparallelscanestimate to be consistent with amcostestimate.

> 3. Are there any changes in cost estimation?
>

Yes.

> I didn't find related changes in the patch.
> Parallel scan is expected to be faster and optimizer definitely should know that.
>

You can find the relavant changes in
parallel_index_opt_exec_support_v2.patch, refer cost_index().

> 4. + uint8 ps_pageStatus; /* state of scan, see below */
> There is no desciption below. I'd make the comment more helpful:
> /* state of scan. See possible flags values in nbtree.h */

makes sense. Will change.

> And why do you call it pageStatus? What does it have to do with page?
>

During scan this tells us whether next page is available for scan.
Another option could be to name it as scanStatus, but not sure if that
is better. Do you think if we add a comment like "indicates whether
next page is available for scan" for this variable then it will be
clear?

> 5. Comment for _bt_parallel_seize() says:
> "False indicates that we have reached the end of scan for
> current scankeys and for that we return block number as P_NONE."
>
> What is the reason to check (blkno == P_NONE) after checking (status == false)
> in _bt_first() (see code below)? If comment is correct
> we'll never reach _bt_parallel_done()
>
> + blkno = _bt_parallel_seize(scan, &status);
> + if (status == false)
> + {
> + BTScanPosInvalidate(so->currPos);
> + return false;
> + }
> + else if (blkno == P_NONE)
> + {
> + _bt_parallel_done(scan);
> + BTScanPosInvalidate(so->currPos);
> + return false;
> + }
>

The first time master backend or worker hits last page (calls this
API), it will return P_NONE and after that any worker tries to fetch
next page, it will return status as false. I think we can expand a
comment to explain it clearly. Let me know, if you need more
clarification, I can explain it in detail.

> 6. To avoid code duplication, I would wrap this into the function
>
> + /* initialize moreLeft/moreRight appropriately for scan direction */
> + if (ScanDirectionIsForward(dir))
> + {
> + so->currPos.moreLeft = false;
> + so->currPos.moreRight = true;
> + }
> + else
> + {
> + so->currPos.moreLeft = true;
> + so->currPos.moreRight = false;
> + }
> + so->numKilled = 0; /* just paranoia */
> + so->markItemIndex = -1; /* ditto */
>

Okay, I think we can write a separate function (probably inline
function) for above.

> And after that we can also get rid of _bt_parallel_readpage() which only
> bring another level of indirection to the code.
>

See, this function is responsible for multiple actions like
initializing moreLeft/moreRight positions, reading next page, dropping
the lock/pin. So replicating all these actions in the caller will
make the code in caller less readable as compared to now. Consider
this point and let me know your view on same.

> 7. Just a couple of typos I've noticed:
>
> * Below flags are used indicate the state of parallel scan.
> * Below flags are used TO indicate the state of parallel scan.
>
> * On success, release lock and pin on buffer on success.
> * On success release lock and pin on buffer.
>

Will fix.

> 8. I didn't find a description of the feature in documentation.
> Probably we need to add a paragraph to the "Parallel Query" chapter.
>

Yes, I am aware of that and I think it makes sense to add it now
rather than waiting until the end.

> I will send another review of performance until the end of the week.
>

Okay, you can refer Rafia's mail above for non-default settings she
has used in her performance tests with TPC-H.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-12-22 04:31:03 Re: multi-level partitions and partition-wise joins
Previous Message Amit Kapila 2016-12-22 02:34:07 Re: Hang in pldebugger after git commit : 98a64d0