Re: Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-11-03 10:25:54
Message-ID: CAA4eK1+_YxN9ZygOJZ=mx0XdULS0kg_teX4WZd=ewewnmZCWPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 2, 2016 at 6:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> [ new patches ]
>
> I looked over parts of this today, mostly the hashinsert.c changes.
>
> + /*
> + * Copy bucket mapping info now; The comment in _hash_expandtable where
> + * we copy this information and calls _hash_splitbucket explains why this
> + * is OK.
> + */
>
> So, I went and tried to find the comments to which this comment is
> referring and didn't have much luck.
>

I guess you have just tried to find it in the patch. However, the
comment I am referring above is an existing comment in
_hash_expandtable(). Refer below comment:
/*
* Copy bucket mapping info now; this saves re-accessing the meta page
* inside _hash_splitbucket's inner loop. ...

> At the point this code is
> running, we have a pin but no lock on the metapage, so this is only
> safe if changing any of these fields requires a cleanup lock on the
> metapage. If that's true,
>

No that's not true, we need just Exclusive content lock to update
those fields and these fields should be copied when we have Share
content lock on metapage. In version-8 of patch, it was correct, but
in last version, it seems during code re-arrangement, I have moved it.
I will change it such that these values are copied under matapage
share content lock. I think moving it just before the preceding for
loop should be okay, let me know if you think otherwise.

> + nblkno = _hash_get_newblk(rel, pageopaque);
>
> I think this is not a great name for this function. It's not clear
> what "new blocks" refers to, exactly. I suggest
> FIND_SPLIT_BUCKET(metap, bucket) or OLD_BUCKET_TO_NEW_BUCKET(metap,
> bucket) returning a new bucket number. I think that macro can be
> defined as something like this: bucket + (1 <<
> (fls(metap->hashm_maxbucket) - 1)).
>

I think such a macro would not work for the usage of incomplete
splits. The reason is that by the time we try to complete the split
of the current old bucket, the table half (lowmask, highmask,
maxbucket) would have changed and it could give you the bucket in new
table half.

> Then do nblkno =
> BUCKET_TO_BLKNO(metap, newbucket) to get the block number. That'd all
> be considerably simpler than what you have for hash_get_newblk().
>

I think to use BUCKET_TO_BLKNO we need either a share or exclusive
lock on metapage and as we need a lock on metapage to find new block
from old block, I thought it is better to do inside
_hash_get_newblk().

>
> Moving on ...
>
> /*
> * ovfl page exists; go get it. if it doesn't have room, we'll
> - * find out next pass through the loop test above.
> + * find out next pass through the loop test above. Retain the
> + * pin, if it is a primary bucket page.
> */
> - _hash_relbuf(rel, buf);
> + if (pageopaque->hasho_flag & LH_BUCKET_PAGE)
> + _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
> + else
> + _hash_relbuf(rel, buf);
>
> It seems like it would be cheaper, safer, and clearer to test whether
> buf != bucket_buf here, rather than examining the page opaque data.
> That's what you do down at the bottom of the function when you ensure
> that the pin on the primary bucket page gets released, and it seems
> like it should work up here, too.
>
> + bool retain_pin = false;
> +
> + /* page flags must be accessed before releasing lock on a page. */
> + retain_pin = pageopaque->hasho_flag & LH_BUCKET_PAGE;
>
> Similarly.
>

Agreed, will change the usage as per your suggestion.

> I have also attached a patch with some suggested comment changes.
>

I will include it in next version of patch.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2016-11-03 10:39:53 Re: delta relations in AFTER triggers
Previous Message Michael Paquier 2016-11-03 09:54:11 Re: WAL consistency check facility