Hash index initial size is too large given NULLs or partial indexes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Hash index initial size is too large given NULLs or partial indexes
Date: 2019-03-08 18:14:11
Message-ID: CAMkU=1x0k+dRQHDUgp4BjFeSgxyLBBXyKNY5Pt1Yu6YHB0mhKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Referring to this thread:

https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices

When a hash index is created on a populated table, it estimates the number
of buckets to start out with based on the number of tuples returned
by estimate_rel_size. But this number ignores both the fact that NULLs are
not stored in hash indexes, and that partial indexes exist. This can lead
to much too large hash indexes. Doing a re-index just repeats the logic,
so doesn't fix anything. Fill factor also can't fix it, as you are not
allowed to increase that beyond 100.

This goes back to when the pre-sizing was implemented in 2008
(c9a1cc694abef737548a2a). It seems to be an oversight, rather than
something that was considered.

Is this a bug that should be fixed? Or if getting a more accurate estimate
is not possible or not worthwhile, add a code comment about that?

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-03-08 18:27:38 Re: Hash index initial size is too large given NULLs or partial indexes
Previous Message Pavel Stehule 2019-03-08 18:13:23 Re: PostgreSQL vs SQL/XML Standards