Re: Hash Indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Indexes
Date: 2016-09-07 18:19:56
Message-ID: CAMkU=1wYxVhEfk9Sg632Yk_oj4zyydeCVqB=d3YbQ-nH5xXzNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 1, 2016 at 8:55 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

>
> I have fixed all other issues you have raised. Updated patch is
> attached with this mail.
>

I am finding the comments (particularly README) quite hard to follow.
There are many references to an "overflow bucket", or similar phrases. I
think these should be "overflow pages". A bucket is a conceptual thing
consisting of a primary page for that bucket and zero or more overflow
pages for the same bucket. There are no overflow buckets, unless you are
referring to the new bucket to which things are being moved.

Was maintaining on-disk compatibility a major concern for this patch?
Would you do things differently if that were not a concern? If we would
benefit from a break in format, I think it would be better to do that now
while hash indexes are still discouraged, rather than in a future release.

In particular, I am thinking about the need for every insert to
exclusive-content-lock the meta page to increment the index-wide tuple
count. I think that this is going to be a huge bottleneck on update
intensive workloads (which I don't believe have been performance tested as
of yet). I was wondering if we might not want to change that so that each
bucket keeps a local count, and sweeps that up to the meta page only when
it exceeds a threshold. But this would require the bucket page to have an
area to hold such a count. Another idea would to keep not a count of
tuples, but of buckets with at least one overflow page, and split when
there are too many of those. I bring it up now because it would be a shame
to ignore it until 10.0 is out the door, and then need to break things in
11.0.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Borodin 2016-09-07 18:20:24 Re: GiST penalty functions [PoC]
Previous Message Tom Lane 2016-09-07 17:44:28 Re: Install extensions using update scripts (was Re: Remove superuser() checks from pgstattuple)