Re: Page Scan Mode in Hash Index

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page Scan Mode in Hash Index
Date: 2017-07-30 08:37:00
Message-ID: CAE9k0PmY36gvh-H2qpXb91+XGibw+azyzOyKHgjZYpb-Uy2prg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
> While doing the code coverage testing of v7 patch shared with - [1], I
> found that there are few lines of code in _hash_next() that are
> redundant and needs to be removed. I basically came to know this while
> testing the scenario where a hash index scan starts when a split is in
> progress. I have removed those lines and attached is the newer version
> of patch.
>

Please find the new version of patches for page at a time scan in hash
index rebased on top of latest commit in master branch. Also, i have
ran pgindent script with pg_bsd_indent version 2.0 on all the modified
files. Thanks.

> [1] - https://www.postgresql.org/message-id/CAE9k0Pmn92Le0iOTU5b6087eRXzUnkq1PKcihxtokHJ9boXCBg%40mail.gmail.com
>
> On Mon, May 8, 2017 at 6:55 PM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>> Hi,
>>
>> On Tue, Apr 4, 2017 at 3:56 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote:
>>>>
>>>> Please note that these patches needs to be applied on top of [1].
>>>>
>>>
>>> Few more review comments:
>>>
>>> 1.
>>> - page = BufferGetPage(so->hashso_curbuf);
>>> + blkno = so->currPos.currPage;
>>> + if (so->hashso_bucket_buf == so->currPos.buf)
>>> + {
>>> + buf = so->currPos.buf;
>>> + LockBuffer(buf, BUFFER_LOCK_SHARE);
>>> + page = BufferGetPage(buf);
>>> + }
>>>
>>> Here, you are assuming that only bucket page can be pinned, but I
>>> think that assumption is not right. When _hash_kill_items() is called
>>> before moving to next page, there could be a pin on the overflow page.
>>> You need some logic to check if the buffer is pinned, then just lock
>>> it. I think once you do that, it might not be convinient to release
>>> the pin at the end of this function.
>>
>> Yes, there are few cases where we might have pin on overflow pages too
>> and we need to handle such cases in _hash_kill_items. I have taken
>> care of this in the attached v7 patch. Thanks.
>>
>>>
>>> Add some comments on top of _hash_kill_items to explain the new
>>> changes or say some thing like "See _bt_killitems for details"
>>
>> Added some more comments on the new changes.
>>
>>>
>>> 2.
>>> + /*
>>> + * We save the LSN of the page as we read it, so that we know whether it
>>> + * safe to apply LP_DEAD hints to the page later. This allows us to drop
>>> + * the pin for MVCC scans, which allows vacuum to avoid blocking.
>>> + */
>>> + so->currPos.lsn = PageGetLSN(page);
>>> +
>>>
>>> The second part of above comment doesn't make sense because vacuum's
>>> will still be blocked because we hold the pin on primary bucket page.
>>
>> That's right. It doesn't make sense because we won't allow vacuum to
>> start. I have removed it.
>>
>>>
>>> 3.
>>> + {
>>> + /*
>>> + * No more matching tuples were found. return FALSE
>>> + * indicating the same. Also, remember the prev and
>>> + * next block number so that if fetching tuples using
>>> + * cursor we remember the page from where to start the
>>> + * scan.
>>> + */
>>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>>
>>> You can't read opaque without having lock and by this time it has
>>> already been released.
>>
>> I have corrected it. Please refer to the attached v7 patch.
>>
>> Also, I think if you want to save position for
>>> cursor movement, then you need to save the position of last bucket
>>> when scan completes the overflow chain, however as you have written it
>>> will be always invalid block number. I think there is similar problem
>>> with prevblock number.
>>
>> Did you mean last bucket or last page. If it is last page, then I am
>> already storing it.
>>
>>>
>>> 4.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>>> + OffsetNumber maxoff, ScanDirection dir)
>>> +{
>>> + HashScanOpaque so = (HashScanOpaque) scan->opaque;
>>> + IndexTuple itup;
>>> + int itemIndex;
>>> +
>>> + if (ScanDirectionIsForward(dir))
>>> + {
>>> + /* load items[] in ascending order */
>>> + itemIndex = 0;
>>> +
>>> + /* new page, relocate starting position by binary search */
>>> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
>>>
>>> What is the need to find offset number when this function already has
>>> that as an input parameter?
>>
>> It's not required. I have removed it.
>>
>>>
>>> 5.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>>> + OffsetNumber maxoff, ScanDirection dir)
>>>
>>> I think maxoff is not required to be passed a parameter to this
>>> function as you need it only for forward scan. You can compute it
>>> locally.
>>
>> It is required for both forward and backward scan. However, I am not
>> passing it to _hash_load_qualified_items() now.
>>
>>>
>>> 6.
>>> +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum,
>>> + OffsetNumber maxoff, ScanDirection dir)
>>> {
>>> ..
>>> + if (ScanDirectionIsForward(dir))
>>> + {
>>> ..
>>> + while (offnum <= maxoff)
>>> {
>>> ..
>>> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
>>> + _hash_checkqual(scan, itup))
>>> + {
>>> + /* tuple is qualified, so remember it */
>>> + _hash_saveitem(so, itemIndex, offnum, itup);
>>> + itemIndex++;
>>> + }
>>> +
>>> + offnum = OffsetNumberNext(offnum);
>>> ..
>>>
>>> Why are you traversing the whole page even when there is no match?
>>> There is a similar problem in backward scan. I think this is blunder.
>>
>> Fixed. Please check the attached
>> '0001-Rewrite-hash-index-scans-to-work-a-page-at-a-timev7.patch'
>>
>>>
>>> 7.
>>> + if (so->currPos.buf == so->hashso_bucket_buf ||
>>> + so->currPos.buf == so->hashso_split_bucket_buf)
>>> + {
>>> + so->currPos.prevPage = InvalidBlockNumber;
>>> + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
>>> + }
>>> + else
>>> + {
>>> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
>>> + _hash_relbuf(rel, so->currPos.buf);
>>> + }
>>> +
>>> + so->currPos.nextPage = (opaque)->hasho_nextblkno;
>>>
>>> What makes you think it is safe read opaque after releasing the lock?
>>
>> Nothing makes me think to read opaque after releasing lock. It's a
>> mistake. I have corrected it. Please check attached v7 patch.
>>
>> --
>> With Regards,
>> Ashutosh Sharma
>> EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
0001-Rewrite-hash-index-scan-to-work-page-at-a-timev9.patch text/x-patch 33.5 KB
0002-Remove-redundant-function-_hash_step-and-some-of-the.patch text/x-patch 8.4 KB
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index.patch text/x-patch 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tels 2017-07-30 08:48:18 Re: PL_stashcache, or, what's our minimum Perl version?
Previous Message Tom Lane 2017-07-30 05:26:50 Re: GSoC 2017: Foreign Key Arrays