Re: Page Scan Mode in Hash Index

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page Scan Mode in Hash Index
Date: 2017-08-11 13:21:56
Message-ID: CAE9k0P=+z0+L6Ug7G1SG8b5Cb9ZOWiCV-Tua3yv3Tcz=RoJqOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Comments on the latest patch:

Thank you once again for reviewing my patches.

>
> 1.
> /*
> + * We remember prev and next block number along with current block
> + * number so that if fetching the tuples using cursor we know the
> + * page from where to begin. This is for the case where we have
> + * reached the end of bucket chain without finding any matching
> + * tuples.
> + */
> + if (!BlockNumberIsValid(opaque->hasho_nextblkno))
> + prev_blkno = opaque->hasho_prevblkno;
>
> This doesn't seem to be the right place for this comment as you are
> not saving next or current block number here. I am not sure whether
> you really need this comment at all. Will it be better if you move
> this to else part of the loop and reword it as:
>
> "Remember next and previous block numbers for scrollable cursors to
> know the start position."

Shifted the comments to else part of the loop.

>
> 2. The code in _hash_readpage looks quite bloated. I think we can
> make it better if we do something like below.
> a.
> +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
> {
> ..
> + if (so->currPos.buf == so->hashso_bucket_buf ||
> + so->currPos.buf == so->hashso_split_bucket_buf)
> + {
> + so->currPos.prevPage = InvalidBlockNumber;
> + so->currPos.nextPage = opaque->hasho_nextblkno;
> + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
> + }
> + else
> + {
> + so->currPos.prevPage = opaque->hasho_prevblkno;
> + so->currPos.nextPage = opaque->hasho_nextblkno;
> + _hash_relbuf(rel, so->currPos.buf);
> + so->currPos.buf = InvalidBuffer;
> + }
> ..
> }
>
> This code chunk is same for both forward and backward scans. I think
> you can have single copy of this code by moving it out of if-else
> loop.

Done.

>
> b.
> +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
> {
> ..
> + /* new page, locate starting position by binary search */
> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
> +
> + itemIndex = _hash_load_qualified_items(scan, page, offnum, dir);
> +
> + while (itemIndex == 0)
> + {
> + /*
> + * Could not find any matching tuples in the current page, move to
> + * the next page. Before leaving the current page, also deal with
> + * any killed items.
> + */
> + if (so->numKilled > 0)
> + _hash_kill_items(scan);
> +
> + /*
> + * We remember prev and next block number along with current block
> + * number so that if fetching the tuples using cursor we know the
> + * page from where to begin. This is for the case where we have
> + * reached the end of bucket chain without finding any matching
> + * tuples.
> + */
> + if (!BlockNumberIsValid(opaque->hasho_nextblkno))
> + prev_blkno = opaque->hasho_prevblkno;
> +
> + _hash_readnext(scan, &buf, &page, &opaque);
> + if (BufferIsValid(buf))
> + {
> + so->currPos.buf = buf;
> + so->currPos.currPage = BufferGetBlockNumber(buf);
> + so->currPos.lsn = PageGetLSN(page);
> + offnum = _hash_binsearch(page, so->hashso_sk_hash);
> + itemIndex = _hash_load_qualified_items(scan, page,
> + offnum, dir);
> ..
> }
>
> Have just one copy of code search the offset and load qualified items.
> Change the above while loop to do..while loop and have a check in
> between break the loop when item index is not zero.

Done that way.

>
> 3.
> - * Find the first item in the index that
> - * satisfies the qualification associated with the scan descriptor. On
> - * success, the page containing the current index tuple is read locked
> - * and pinned, and the scan's opaque data entry is updated to
> - * include the buffer.
> + * We find the first item (or, if backward scan, the last item) in
> + * the index that satisfies the qualification associated with the
> + * scan descriptor. On success, the page containing the current
> + * index tuple is read locked and pinned, and data about the
> + * matching tuple(s) on the page has been loaded into so->currPos,
> + * scan->xs_ctup.t_self is set to the heap TID of the current tuple.
> + *
> + * If there are no matching items in the index, we return FALSE,
> + * with no pins or locks held.
> */
> bool
> _hash_first(IndexScanDesc scan, ScanDirection dir)
> @@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
>
> I don't think on success, the lock or pin is held anymore, after this
> patch the only pin will be retained and that too for bucket page.
> Also, there is no need to modify the part of the comment which is not
> related to change in this patch.

Corrected.

>
> I don't see scan->xs_ctup.t_self getting set anywhere in this
> function. I think you are setting it in hashgettuple. It is better
> to move that assignment from hashgettuple to _hash_first so as to be
> consistent with _hash_next.

Done that way.

>
> 4.
> + * On successful exit, scan->xs_ctup.t_self is set to the TID
> + * of the next heap tuple, and if requested, scan->xs_itup
> + * points to a copy of the index tuple. so->currPos is updated
> + * as needed.
> + *
> + * On failure exit (no more tuples), we return FALSE with no
> + * pins or locks held.
> */
> bool
> _hash_next(IndexScanDesc scan, ScanDirection dir)
>
> I don't see the usage of scan->xs_itup in this function. It seems to
> me that you have copied it from btree code and forgot to remove part
> of the comment which is not relevant to a hash index.

Corrected.

>
> 5.
> @@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan)
> {
> ..
> + if (HashScanPosIsValid(so->currPos))
> {
> - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
> - _hash_kill_items(scan);
> - LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
> + /* Before leaving current page, deal with any killed items */
> + if (so->numKilled > 0)
> + _hash_kill_items(scan);
> + _hash_dropscanbuf(rel, so);
> }
>
> - _hash_dropscanbuf(rel, so);
> -
> ..
> }
>
> I don't think it is a good idea to move _hash_dropscanbuf as that
> check just ensures if the current buffer is valid. It doesn't say
> anything about other buffers saved in HashScanOpaque. A similar
> change is required in hashrescan.

Done.

>
> 6.
> _hash_kill_items(IndexScanDesc scan)
> {
> ..
> + if (so->hashso_bucket_buf == so->currPos.buf ||
> + HashScanPosIsPinned(so->currPos))
> ..
> }
>
> Isn't second check enough? It should indirectly cover the first test.

Yes, one check should be fine. Corrected it.

>
> 7.
> _hash_kill_items(IndexScanDesc scan)
> {
> ..
> + /*
> + * If page LSN differs it means that the page was modified since the
> + * last read. killedItems could be not valid so LP_DEAD hints apply-
> + * ing is not safe.
> + */
> + page = BufferGetPage(buf);
> + if (PageGetLSN(page) != so->currPos.lsn)
> + {
> + _hash_relbuf(rel, buf);
> + return;
> + }
> ..
> }
>
> How does this check cover the case of unlogged tables?

Thanks for putting that point. It doesn't cover the case for unlogged
tables. As suggested by you in one of your email in this mailing list, i am
now not allowing vacuum to release lock on current page before acquiring
lock on next page for unlogged tables. This will ensure that scan is always
behind vacuum if they are running on the same bucket simultaneously.
Therefore, there is danger in marking tuples as dead for unlogged pages
even if they are not having any lsn.

With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
0001-Rewrite-hash-index-scan-to-work-page-at-a-time_v11.patch text/x-patch 31.6 KB
0002-Remove-redundant-hash-function-_hash_step-and-do-som.patch text/x-patch 8.5 KB
0003-Improve-locking-startegy-during-VACUUM-in-Hash-Index_v5.patch text/x-patch 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-08-11 13:25:54 Re: SCRAM protocol documentation
Previous Message Álvaro Hernández Tortosa 2017-08-11 13:06:38 Re: SCRAM protocol documentation