Re: Next Steps with Hash Indexes

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Sadhuprasad Patro <b(dot)sadhu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Next Steps with Hash Indexes
Date: 2021-10-05 10:38:02
Message-ID: CANbhV-GhTaDedzG-+zBDhMgX1eLSJeyP+XrySoPuHKMp1p9Fow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 27 Sept 2021 at 06:52, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Sep 23, 2021 at 11:11 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 23, 2021 at 10:04 AM Sadhuprasad Patro <b(dot)sadhu(at)gmail(dot)com> wrote:
> > >
> > > And to get the multi-column hash index selected, we may set
> > > enable_hashjoin =off, to avoid any condition become join condition,
> > > saw similar behaviors in other DBs as well...
> >
> > This may be related to Tom's point that, if some of the quals are
> > removed due to optimization or converted to join quals, then now, even
> > if the user has given qual on all the key columns the index scan will
> > not be selected because we will be forcing that the hash index can
> > only be selected if it has quals on all the key attributes?
> >
> > I don't think suggesting enable_hashjoin =off is a solution,
> >
>
> Yeah, this doesn't sound like a good idea. How about instead try to
> explore the idea where the hash (bucket assignment and search) will be
> based on the first index key and the other columns will be stored as
> payload? I think this might pose some difficulty in the consecutive
> patch to enable a unique index because it will increase the chance of
> traversing more buckets for uniqueness checks. If we see such
> problems, then I have another idea to minimize the number of buckets
> that we need to lock during uniqueness check which is by lock chaining
> as is used during hash bucket clean up where at a time we don't need
> to lock more than two buckets at a time.

I have presented a simple, almost trivial, patch to allow multi-col
hash indexes. It hashes the first column only, which can be a downside
in *some* cases. If that is clearly documented, it would not cause
many issues, IMHO. However, it does not have any optimization issues
or complexities, which is surely a very good thing.

Trying to involve *all* columns in the hash index is a secondary
optimization. It requires subtle changes in optimizer code, as Tom
points out. It also needs fine tuning to make the all-column approach
beneficial for the additional cases without losing against what the
"first column" approach gives.

I did consider both approaches and after this discussion I am still in
favour of committing the very simple "first column" approach to
multi-col hash indexes now.

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-10-05 10:41:58 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Bharath Rupireddy 2021-10-05 10:35:04 Re: Fix pg_log_backend_memory_contexts() 's delay