Re: Parallel bitmap heap scan

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2016-11-22 03:35:36
Message-ID: CAFiTN-tZMvq-tGF7vkejTxkVTBOaTEvvcNPnXAjoLmKbde0NXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:

Thanks for the review..

> In pbms_is_leader() , I didn't clearly understand the significance of
> the for-loop. If it is a worker, it can call
> ConditionVariablePrepareToSleep() followed by
> ConditionVariableSleep(). Once it comes out of
> ConditionVariableSleep(), isn't it guaranteed that the leader has
> finished the bitmap ? If yes, then it looks like it is not necessary
> to again iterate and go back through the pbminfo->state checking.
> Also, with this, variable queuedSelf also might not be needed. But I
> might be missing something here.

I have taken this logic from example posted on conditional variable thread[1]

for (;;)
{
ConditionVariablePrepareToSleep(cv);
if (condition for which we are waiting is satisfied)
break;
ConditionVariableSleep();
}
ConditionVariableCancelSleep();

[1] https://www.postgresql.org/message-id/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com

So it appears to me even if we come out of ConditionVariableSleep();
we need to verify our condition and then only break.

Not sure what happens if worker calls
> ConditionVariable[Prepare]Sleep() but leader has already called
> ConditionVariableBroadcast(). Does the for loop have something to do
> with this ? But this can happen even with the current for-loop, it
> seems.
If leader has already called ConditionVariableBroadcast, but after
ConditionVariablePrepareToSleep we will check the condition again
before calling ConditionVariableSleep. And condition check is under
SpinLockAcquire(&pbminfo->state_mutex);

However I think there is one problem in my code (I think you might be
pointing same), that after ConditionVariablePrepareToSleep, if
pbminfo->state is already PBM_FINISHED, I am not resetting needWait to
false, and which can lead to the problem.

>
>
>> #3. Bitmap processing (Iterate and process the pages).
>> In this phase each worker will iterate over page and chunk array and
>> select heap pages one by one. If prefetch is enable then there will be
>> two iterator. Since multiple worker are iterating over same page and
>> chunk array we need to have a shared iterator, so we grab a spin lock
>> and iterate within a lock, so that each worker get and different page
>> to process.
>
> tbm_iterate() call under SpinLock :
> For parallel tbm iteration, tbm_iterate() is called while SpinLock is
> held. Generally we try to keep code inside Spinlock call limited to a
> few lines, and that too without occurrence of a function call.
> Although tbm_iterate() code itself looks safe under a spinlock, I was
> checking if we can squeeze SpinlockAcquire() and SpinLockRelease()
> closer to each other. One thought is : in tbm_iterate(), acquire the
> SpinLock before the while loop that iterates over lossy chunks. Then,
> if both chunk and per-page data remain, release spinlock just before
> returning (the first return stmt). And then just before scanning
> bitmap of an exact page, i.e. just after "if (iterator->spageptr <
> tbm->npages)", save the page handle, increment iterator->spageptr,
> release Spinlock, and then use the saved page handle to iterate over
> the page bitmap.

Main reason to keep Spin lock out of this function to avoid changes
inside this function, and also this function takes local iterator as
input which don't have spin lock reference to it. But that can be
changed, we can pass shared iterator to it.

I will think about this logic and try to update in next version.

>
> prefetch_pages() call under Spinlock :
> Here again, prefetch_pages() is called while pbminfo->prefetch_mutex
> Spinlock is held. Effectively, heavy functions like PrefetchBuffer()
> would get called while under the Spinlock. These can even ereport().
> One option is to use mutex lock for this purpose. But I think that
> would slow things down. Moreover, the complete set of prefetch pages
> would be scanned by a single worker, and others might wait for this
> one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex
> Spinlock only while incrementing pbminfo->prefetch_pages. The rest
> part viz : iterating over the prefetch pages, and doing the
> PrefetchBuffer() need not be synchronised using this
> pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has
> its own iterator spinlock. Only thing is, workers may not do the
> actual PrefetchBuffer() sequentially. One of them might shoot ahead
> and prefetch 3-4 pages while the other is lagging with the
> sequentially lesser page number; but I believe this is fine, as long
> as they all prefetch all the required blocks.

I agree with your point, will try to fix this as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-11-22 03:42:14 Re: WIP: multivariate statistics / proof of concept
Previous Message Kyotaro HORIGUCHI 2016-11-22 03:35:21 Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?