Re: Multicolumn hash indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomasz Ostrowski <tometzky+pg(at)ato(dot)waw(dot)pl>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: Multicolumn hash indexes
Date: 2017-09-29 14:45:49
Message-ID: CA+TgmoY9O9CesFCivK8Pq77aon5Sy=6LTTb6y_hcUYAQ1NxpoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 27, 2017 at 5:52 PM, Tomasz Ostrowski
<tometzky+pg(at)ato(dot)waw(dot)pl> wrote:
> I feel that this would eliminate a large amount of potential gains from such
> an index. This would be usable only when a sufficiently variable column
> exists, in which case a simple hash index on the column wouldn't be much
> worse.

If the index isn't very selective and any query will match many rows
anyway, I think it will often be superior to use a BRIN index (but I
might be wrong).

Maybe you're worrying about something like a billion-row table where
there are 3 columns that form a composite key: (1,1,1), (1,1,2), ...,
(1,1000),(1,2,1),...,(1,1000,1000),(2,1,1),...,(1000,1000,1000). In
that case, treating the leading column as most important will indeed
be terrible, since we'll put all billion rows into 1000 buckets no
matter how many bucket splits we do.

That seems a little unusual, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-09-29 14:50:48 Re: GSoC 2017: weekly progress reports (week 4) and patch for hash index
Previous Message Bossart, Nathan 2017-09-29 14:33:18 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands