Re: Page Scan Mode in Hash Index

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page Scan Mode in Hash Index
Date: 2017-05-10 08:58:55
Message-ID: CAE9k0P=JzsP07sz+-1Ujnzfxn96JhEW+6dq4-HXsKa906CE9qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While doing the code coverage testing of v7 patch shared with - [1], I
found that there are few lines of code in _hash_next() that are
redundant and needs to be removed. I basically came to know this while
testing the scenario where a hash index scan starts when a split is in
progress. I have removed those lines and attached is the newer version
of patch.

[1] - https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com

On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> Hi,
>
> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>>>
>>> Please note that these patches needs to be applied on top of [1].
>>>
>>
>> Few more review comments:
>>
>> 1.
>> - page = BufferGetPage(so->hashso_curbuf);
>> + blkno = so->currPos.currPage;
>> + if (so->hashso_bucket_buf == so->currPos.buf)
>> + {
>> + buf = so->currPos.buf;
>> + LockBuffer(buf, BUFFER_LOCK_SHARE);
>> + page = BufferGetPage(buf);
>> + }
>>
>> Here, you are assuming that only bucket page can be pinned, but I
>> think that assumption is not right. When _hash_kill_items() is called
>> before moving to next page, there could be a pin on the overflow page.
>> You need some logic to check if the buffer is pinned, then just lock
>> it. I think once you do that, it might not be convinient to release
>> the pin at the end of this function.
>
> Yes, there are few cases where we might have pin on overflow pages too
> and we need to handle such cases in _hash_kill_items. I have taken
> care of this in the attached v7 patch. Thanks.
>
>>
>> Add some comments on top of _hash_kill_items to explain the new
>> changes or say some thing like "See _bt_killitems for details"
>
> Added some more comments on the new changes.
>
>>
>> 2.
>> + /*
>> + * We save the LSN of the page as we read it, so that we know whether it
>> + * safe to apply LP_DEAD hints to the page later. This allows us to drop
>> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
>> + */
>> + so->currPos.lsn = PageGetLSN(page);
>> +
>>
>> The second part of above comment doesn't make sense because vacuum's
>> will still be blocked because we hold the pin on primary bucket page.
>
> That's right. It doesn't make sense because we won't allow vacuum to
> start. I have removed it.
>
>>
>> 3.
>> + {
>> + /*
>> + * No more matching tuples were found. return FALSE
>> + * indicating the same. Also, remember the prev and
>> + * next block number so that if fetching tuples using
>> + * cursor we remember the page from where to start the
>> + * scan.
>> + */
>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>
>> You can't read opaque without having lock and by this time it has
>> already been released.
>
> I have corrected it. Please refer to the attached v7 patch.
>
> Also, I think if you want to save position for
>> cursor movement, then you need to save the position of last bucket
>> when scan completes the overflow chain, however as you have written it
>> will be always invalid block number. I think there is similar problem
>> with prevblock number.
>
> Did you mean last bucket or last page. If it is last page, then I am
> already storing it.
>
>>
>> 4.
>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>> + OffsetNumber maxoff, ScanDirection dir)
>> +{
>> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
>> + IndexTuple itup;
>> + int itemIndex;
>> +
>> + if (ScanDirectionIsForward(dir))
>> + {
>> + /* load items[] in ascending order */
>> + itemIndex = 0;
>> +
>> + /* new page, relocate starting position by binary search */
>> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>>
>> What is the need to find offset number when this function already has
>> that as an input parameter?
>
> It's not required. I have removed it.
>
>>
>> 5.
>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>> + OffsetNumber maxoff, ScanDirection dir)
>>
>> I think maxoff is not required to be passed a parameter to this
>> function as you need it only for forward scan. You can compute it
>> locally.
>
> It is required for both forward and backward scan. However, I am not
> passing it to _hash_load_qualified_items() now.
>
>>
>> 6.
>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>> + OffsetNumber maxoff, ScanDirection dir)
>> {
>> ..
>> + if (ScanDirectionIsForward(dir))
>> + {
>> ..
>> + while (offnum <= maxoff)
>> {
>> ..
>> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
>> + _hash_checkqual(scan, itup))
>> + {
>> + /* tuple is qualified, so remember it */
>> + _hash_saveitem(so, itemIndex, offnum, itup);
>> + itemIndex++;
>> + }
>> +
>> + offnum = OffsetNumberNext(offnum);
>> ..
>>
>> Why are you traversing the whole page even when there is no match?
>> There is a similar problem in backward scan. I think this is blunder.
>
> Fixed. Please check the attached
> '0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch'
>
>>
>> 7.
>> + if (so->currPos.buf == so->hashso_bucket_buf ||
>> + so->currPos.buf == so->hashso_split_bucket_buf)
>> + {
>> + so->currPos.prevPage = InvalidBlockNumber;
>> + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
>> + }
>> + else
>> + {
>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>> + _hash_relbuf(rel, so->currPos.buf);
>> + }
>> +
>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>
>> What makes you think it is safe read opaque after releasing the lock?
>
> Nothing makes me think to read opaque after releasing lock. It's a
> mistake. I have corrected it. Please check attached v7 patch.
>
> --
> With Regards,
> Ashutosh Sharma
> EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
0001-Rewrite-hash-index-scan-to-work-page-at-a-timev8.patch text/x-patch 33.0 KB
0002-Remove-redundant-function-_hash_step-and-some-of-the.patch text/x-patch 8.4 KB
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2017-05-10 10:25:21 Re: Adding support for Default partition in partitioning
Previous Message Petr Jelinek 2017-05-10 08:43:07 Re: snapbuild woes