Re: [POC] A better way to expand hash indexes.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] A better way to expand hash indexes.
Date: 2017-03-27 05:51:20
Message-ID: CAA4eK1L=gE+YW1OcZiUbmnboapVvZJu0jJp7Su7oqZE6pjVKvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 26, 2017 at 11:26 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
> Thanks, Amit for the review.
> On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> I think one-dimensional patch has fewer places to touch, so that looks
>> better to me. However, I think there is still hard coding and
>> assumptions in code which we should try to improve.
>
> Great!, I will continue with spares 1-dimensional improvement.
>

@@ -563,18 +563,20 @@ _hash_init_metabuffer(Buffer buf, double
num_tuples, RegProcedure procid,\
{
..
else
- num_buckets = ((uint32) 1) << _hash_log2((uint32) dnumbuckets);
+ num_buckets = _hash_get_totalbuckets(_hash_spareindex(dnumbuckets));
..
..
- metap->hashm_maxbucket = metap->hashm_lowmask = num_buckets - 1;
- metap->hashm_highmask = (num_buckets << 1) - 1;
+ metap->hashm_maxbucket = num_buckets - 1;
+
+ /* set hishmask, which should be sufficient to cover num_buckets. */
+ metap->hashm_highmask = (1 << (_hash_log2(num_buckets))) - 1;
+ metap->hashm_lowmask = (metap->hashm_highmask >> 1);
}

I think we can't change the number of buckets to be created or lowmask
and highmask calculation here without modifying _h_spoolinit() because
it sorts the data to be inserted based on hashkey which in turn
depends on the number of buckets that we are going to create during
create index operation. We either need to allow create index
operation to still always create buckets in power-of-two fashion or we
need to update _h_spoolinit according to new computation. One minor
drawback of using power-of-two scheme for creation of buckets during
create index is that it can lead to wastage of space and will be
inconsistent with what the patch does during split operation.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-03-27 05:53:32 Re: [sqlsmith] Unpinning error in parallel worker
Previous Message Rafia Sabih 2017-03-27 05:48:00 Re: [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.