Re: Cache Hash Index meta page.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Cache Hash Index meta page.
Date: 2017-01-02 05:59:00
Message-ID: CAA4eK1+TDGpzP_UrgUbWqi+1JnJFxAcWgfuwamWcq4s6J3KoTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
> On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
>>>> -- I think if it is okay, I can document same for each member of HashMetaPageData whether to read from cached from meta page or directly from current meta page. Below briefly I have commented for each member. If you suggest I can go with that approach, I will produce a neat patch for same.
>>>
>>> Plain text emails are preferred on this list.
>
> Sorry, I have set the mail to plain text mode now.
>
>> I think this will make metpage cache access somewhat
>> similar to what we have in btree where we use cache to access
>> rootpage. Will something like that address your concern?
>
> Thanks, just like _bt_getroot I am introducing a new function
> _hash_getbucketbuf_from_hashkey() which will give the target bucket
> buffer for the given hashkey. This will use the cached metapage
> contents instead of reading meta page buffer if cached data is valid.
> There are 2 places which can use this service 1. _hash_first and 2.
> _hash_doinsert.
>

This version of the patch looks much better than the previous version.
I have few review comments:

1.
+ * pin so we can use the metabuf while writing into to it below.
+ */
+ metabuf =
_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);

usage "into to .." in above comment seems to be wrong.

2.
- pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+ /*
+ *
Setting hasho_prevblkno of bucket page with latest maxbucket number
+ * to indicate
bucket has been initialized and need to reconstruct
+ * HashMetaCache if it is older.
+
*/
+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;

In the above comment, a reference to HashMetaCache is confusing, are
your referring to any structure here? If you change this, consider to
change the similar usage at other places in the patch as well.

Also, it is not clear to what do you mean by ".. to indicate bucket
has been initialized .."? assigning maxbucket as a special value to
prevblkno is not related to initializing a bucket page.

3.
typedef struct HashPageOpaqueData
{
+ /*
+ * If this is an ovfl page this stores previous ovfl
(or bucket) blkno.
+ * Else if this is a bucket page we use this for a special purpose. We
+ *
store hashm_maxbucket value, whenever this page is initialized or
+ * split. So this helps us
to know whether the bucket has been split after
+ * caching the some of the meta page data.
See _hash_doinsert(),
+ * _hash_first() to know how to use same.
+ */

In above comment, where you are saying ".. caching the some of the
meta page data .." is slightly confusing, because it appears to me
that you are caching whole of the metapage not some part of it.

4.
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,

Generally, for _hash_* API's, we use rel as the first parameter, so I
think it is better to maintain the same with this API as well.

5.
_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
- maxbucket, highmask, lowmask);
+ usedmetap->hashm_maxbucket,
+ usedmetap->hashm_highmask,
+ usedmetap->hashm_lowmask);

I think we should add an Assert for the validity of usedmetap before using it.

6. Getting few compilation errors in assert-enabled build.

1>src/backend/access/hash/hashinsert.c(85): error C2065: 'bucket' :
undeclared identifier
1>src/backend/access/hash/hashinsert.c(155): error C2065: 'bucket' :
undeclared identifier

1>src/backend/access/hash/hashsearch.c(223): warning C4101: 'blkno' :
unreferenced local variable
1> hashutil.c
1>\src\backend\access\hash\hashsearch.c(284): warning C4700:
uninitialized local variable 'bucket' used

7.
+ /*
+ * Check if this bucket is split after we have cached the hash meta
+ * data. To do this we need to check whether cached maxbucket number
+ * is less than or equal to maxbucket number stored in bucket page,
+ * which was set with that times maxbucket number during bucket page
+ * splits. In case of upgrade hashno_prevblkno of old bucket page will
+ * be set with InvalidBlockNumber. And as of now maximum value the
+ * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+ * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+ * hasho_prevblkno will tell whether current bucket has been split
+ * after caching hash meta data.
+ */

I can understand what you want to say in above comment, but I think
you can write it in somewhat shorter form. There is no need to
explain the exact check.

8.
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
_hash_relbuf(rel, *bufp);

*bufp = InvalidBuffer;
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+ return;
+

I don't think above check is right. Even if it is a bucket page, we
might need to scan bucket being populated, refer check else if
(so->hashso_buc_populated && so->hashso_buc_split). Apart from that,
you can't access bucket page after releasing the lock on it. Why have
you added such a check?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-01-02 06:36:20 Re: rewrite HeapSatisfiesHOTAndKey
Previous Message Jim Nasby 2017-01-02 05:44:58 Odd behavior with PG_TRY