Re: Next Steps with Hash Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Next Steps with Hash Indexes
Date: 2021-10-26 21:02:18
Message-ID: CA+Tgmoanvcdz4RfpqCb-1pNcf8mGd=3Uw6T68YcSYKnEJWCrfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> With unique data, starting at 1 and monotonically ascending, hash
> indexes will grow very nicely from 0 to 10E7 rows without causing >1
> overflow block to be allocated for any bucket. This keeps the search
> time for such data to just 2 blocks (bucket plus, if present, 1
> overflow block). The small number of overflow blocks is because of the
> regular and smooth way that splits occur, which works very nicely
> without significant extra latency.

It is my impression that with non-unique data things degrade rather
badly. There's no way to split the buckets that are overflowing
without also splitting the buckets that are completely empty or, in
any event, not full enough to need any overflow pages. I think that's
rather awful.

> The probability of bucket collision while we hold the lock is fairly
> low. This is because even with adjacent data values the hash values
> would be spread across multiple buckets, so we would expect the
> contention to be less than we would get on a monotonically increasing
> btree.
>
> So I don't now see any problem from holding the buffer lwlock on the
> bucket while we do multi-buffer operations.

I don't think that contention is the primary concern here. I think one
major concern is interruptibility: a process must be careful not to
hold lwlocks across long stretches of code, because it cannot be
cancelled while it does. Even if the code is bug-free and the database
has no corruption, long pauses before cancels take effect can be
inconvenient. But as soon as you add in those considerations things
get much worse. Imagine a hash index that is corrupted so that there
is a loop in the list of overflow pages. No matter what, we're going
to go into an infinite loop scanning that bucket, but if we're holding
a buffer lock while we do it, there's no way to escape short of
bouncing the entire server. That's pretty bad.

Undetected deadlock is something to think about, too.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2021-10-26 21:03:24 Re: Predefined role pg_maintenance for VACUUM, ANALYZE, CHECKPOINT.
Previous Message Arjan van de Ven 2021-10-26 20:58:17 [PATCH v2] src/port/snprintf.c: Optimize the common base=10 case in fmtint