Re: Hash index with larger hashes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash index with larger hashes
Date: 2016-08-05 17:36:22
Message-ID: CA+TgmoYhe2dKuDUxn80sfZLtJ1NwfKAwRnP0ZX1hQmShrSoVmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 5, 2016 at 10:39 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> I have been following the recent discussions on increasing the
> size of the hash function used in Postgres and the work to
> provide WAL and performance improvements for hash indexes.
> I know it was mentioned when we moved to the new hashing
> functions, but the existing functions do provide an additional
> 32-bits of hash. We currently do not use them, but they are
> already calculated.
>
> It had occurred to me that one way to decrease the space used
> to store the hash value would be to include information about
> the page number to determine the actual value. For example,
> a hash index of 65k pages (540mb) would get two additional
> bytes of hash with no associated storage cost. Also, if you
> subdivided the hash page into say 128 sub-pages you would
> get the extra 2 bytes of hash in a 4mb index. In this way,
> the bigger the hash index is, the more bits you get for free.
>
> Just wanted to throw it out there.

I'm not sure I understand exactly what you are proposing here.
Suppose we have 64 bits of hashcode and (1 << N) buckets. Currently,
we store hashcode bits 0..31 on each item. Maybe what you're saying
is that we could instead store bits B..(31+B) on each item - that is,
cram in as many extra bits on each individual item as log2(nbuckets).
The problem with that is that it would make it very hard to split
buckets.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-08-05 17:38:05 Re: multivariate statistics (v19)
Previous Message Robert Haas 2016-08-05 17:32:18 Re: truncate trigger for foreign data wrappers