Re: Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-11-04 19:41:13
Message-ID: CAA4eK1+k-ZA_7zrSvxLDk+pUVKnm7CVhU+4OWao62BA16-w3RA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 4, 2016 at 9:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Nov 1, 2016 at 9:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> [ new patches ]
>>
>> I looked over parts of this today, mostly the hashinsert.c changes.
>
> Some more review...
>
> @@ -656,6 +678,10 @@ _hash_squeezebucket(Relation rel,
> IndexTuple itup;
> Size itemsz;
>
> + /* skip dead tuples */
> + if (ItemIdIsDead(PageGetItemId(rpage, roffnum)))
> + continue;
>
> Is this an optimization independent of the rest of the patch, or is
> there something in this patch that necessitates it?
>

This specific case is independent of rest of patch, but the same
optimization is used in function _hash_splitbucket_guts() which is
mandatory, because otherwise it will make a copy of that tuple without
copying dead flag.

> i.e. Could this
> small change be committed independently?

Both the places _hash_squeezebucket() and _hash_splitbucket can use
this optimization irrespective of rest of the patch. I will prepare a
separate patch for these and send along with main patch after some
testing.

> If not, then I think it
> needs a better comment explaining why it is now mandatory.
>
> - * Caller must hold exclusive lock on the target bucket. This allows
> + * Caller must hold cleanup lock on the target bucket. This allows
> * us to safely lock multiple pages in the bucket.
>
> The notion of a lock on a bucket no longer really exists; with this
> patch, we'll now properly speak of a lock on a primary bucket page.
> Also, I think the bit about safely locking multiple pages is bizarre
> -- that's not the issue at all: the problem is that reorganizing a
> bucket might confuse concurrent scans into returning wrong answers.
>
> I've included a broader updating of that comment, and some other
> comment changes, in the attached incremental patch, which also
> refactors your changes to _hash_freeovflpage() a bit to avoid some
> code duplication. Please consider this for inclusion in your next
> version.
>

Your modifications looks good to me, so will include it in next version.

> In hashutil.c, I think that _hash_msb() is just a reimplementation of
> fls(), which you can rely on being present because we have our own
> implementation in src/port. It's quite similar to yours but slightly
> shorter. :-) Also, some systems have a builtin fls() function which
> actually optimizes down to a single machine instruction, and which is
> therefore much faster than either version.
>

Agreed, will change as per suggestion.

> I don't like the fact that _hash_get_newblk() and _hash_get_oldblk()
> are working out the bucket number by using the HashOpaque structure
> within the bucket page they're examining. First, it seems weird to
> pass the whole structure when you only need the bucket number out of
> it. More importantly, the caller really ought to know what bucket
> they care about without having to discover it.
>
> For example, in _hash_doinsert(), we figure out the bucket into which
> we need to insert, and we store that in a variable called "bucket".
> Then from there we work out the primary bucket page's block number,
> which we store in "blkno". We read the page into "buf" and put a
> pointer to that buffer's contents into "page" from which we discover
> the HashOpaque, a pointer to which we store into "pageopaque". Then
> we pass that to _hash_get_newblk() which will go look into that
> structure to find the bucket number ... but couldn't we have just
> passed "bucket" instead?
>

Yes, it can be done. However, note that pageopaque is not only
retrieved for passing to _hash_get_newblk(), rather it is used in
below code as well, so we can't remove that.

> Similarly, _hash_expandtable() has the value
> available in the variable "old_bucket".
>
> The only caller of _hash_get_oldblk() is _hash_first(), which has the
> bucket number available in a variable called "bucket".
>
> So it seems to me that these functions could be simplified to take the
> bucket number as an argument directly instead of the HashOpaque.
>

Okay, I agree that it is better to use bucket number in both the
API's, so will change it accordingly.

> Generally, this pattern recurs throughout the patch. Every time you
> use the data in the page to figure something out which the caller
> already knew, you're introducing a risk of bugs: what if the answers
> don't match? I think you should try to root out as much of that from
> this code as you can.
>

Okay, I will review the patch once with this angle and see if I can improve it.

> As you may be able to tell, I'm working my way into this patch
> gradually, starting with peripheral parts and working toward the core
> of it. Generally, I think it's in pretty good shape, but I still have
> quite a bit left to study.
>

Thanks.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2016-11-04 19:42:47 Re: Gather Merge
Previous Message Thomas Munro 2016-11-04 19:36:22 Re: Gather Merge