Re: Page Scan Mode in Hash Index

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page Scan Mode in Hash Index
Date: 2017-05-08 13:25:40
Message-ID: CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>>
>> Please note that these patches needs to be applied on top of [1].
>>
>
> Few more review comments:
>
> 1.
> - page = BufferGetPage(so->hashso_curbuf);
> + blkno = so->currPos.currPage;
> + if (so->hashso_bucket_buf == so->currPos.buf)
> + {
> + buf = so->currPos.buf;
> + LockBuffer(buf, BUFFER_LOCK_SHARE);
> + page = BufferGetPage(buf);
> + }
>
> Here, you are assuming that only bucket page can be pinned, but I
> think that assumption is not right. When _hash_kill_items() is called
> before moving to next page, there could be a pin on the overflow page.
> You need some logic to check if the buffer is pinned, then just lock
> it. I think once you do that, it might not be convinient to release
> the pin at the end of this function.

Yes, there are few cases where we might have pin on overflow pages too
and we need to handle such cases in _hash_kill_items. I have taken
care of this in the attached v7 patch. Thanks.

>
> Add some comments on top of _hash_kill_items to explain the new
> changes or say some thing like "See _bt_killitems for details"

Added some more comments on the new changes.

>
> 2.
> + /*
> + * We save the LSN of the page as we read it, so that we know whether it
> + * safe to apply LP_DEAD hints to the page later. This allows us to drop
> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
> + */
> + so->currPos.lsn = PageGetLSN(page);
> +
>
> The second part of above comment doesn't make sense because vacuum's
> will still be blocked because we hold the pin on primary bucket page.

That's right. It doesn't make sense because we won't allow vacuum to
start. I have removed it.

>
> 3.
> + {
> + /*
> + * No more matching tuples were found. return FALSE
> + * indicating the same. Also, remember the prev and
> + * next block number so that if fetching tuples using
> + * cursor we remember the page from where to start the
> + * scan.
> + */
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>
> You can't read opaque without having lock and by this time it has
> already been released.

I have corrected it. Please refer to the attached v7 patch.

Also, I think if you want to save position for
> cursor movement, then you need to save the position of last bucket
> when scan completes the overflow chain, however as you have written it
> will be always invalid block number. I think there is similar problem
> with prevblock number.

Did you mean last bucket or last page. If it is last page, then I am
already storing it.

>
> 4.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
> + OffsetNumber maxoff, ScanDirection dir)
> +{
> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
> + IndexTuple itup;
> + int itemIndex;
> +
> + if (ScanDirectionIsForward(dir))
> + {
> + /* load items[] in ascending order */
> + itemIndex = 0;
> +
> + /* new page, relocate starting position by binary search */
> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>
> What is the need to find offset number when this function already has
> that as an input parameter?

It's not required. I have removed it.

>
> 5.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
> + OffsetNumber maxoff, ScanDirection dir)
>
> I think maxoff is not required to be passed a parameter to this
> function as you need it only for forward scan. You can compute it
> locally.

It is required for both forward and backward scan. However, I am not
passing it to _hash_load_qualified_items() now.

>
> 6.
> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
> + OffsetNumber maxoff, ScanDirection dir)
> {
> ..
> + if (ScanDirectionIsForward(dir))
> + {
> ..
> + while (offnum <= maxoff)
> {
> ..
> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
> + _hash_checkqual(scan, itup))
> + {
> + /* tuple is qualified, so remember it */
> + _hash_saveitem(so, itemIndex, offnum, itup);
> + itemIndex++;
> + }
> +
> + offnum = OffsetNumberNext(offnum);
> ..
>
> Why are you traversing the whole page even when there is no match?
> There is a similar problem in backward scan. I think this is blunder.

Fixed. Please check the attached
'0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch'

>
> 7.
> + if (so->currPos.buf == so->hashso_bucket_buf ||
> + so->currPos.buf == so->hashso_split_bucket_buf)
> + {
> + so->currPos.prevPage = InvalidBlockNumber;
> + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
> + }
> + else
> + {
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + _hash_relbuf(rel, so->currPos.buf);
> + }
> +
> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>
> What makes you think it is safe read opaque after releasing the lock?

Nothing makes me think to read opaque after releasing lock. It's a
mistake. I have corrected it. Please check attached v7 patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch text/x-patch 34.4 KB
0002-Remove-redundant-function-_hash_step-and-some-of-the.patch text/x-patch 8.4 KB
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-05-08 13:26:32 Re: Atomics for heap_parallelscan_nextpage()
Previous Message Daniele Varrazzo 2017-05-08 13:12:55 Re: Detecting schema changes during logical replication