Re: Hash index initial size is too large given NULLs or partial indexes

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash index initial size is too large given NULLs or partial indexes
Date: 2019-03-08 18:27:38
Message-ID: b0c1be5e-94ce-0496-fe96-b61be7a3067c@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/8/19 7:14 PM, Jeff Janes wrote:
> Referring to this thread:
>
> https://dba.stackexchange.com/questions/231647/why-are-partial-postgresql-hash-indices-not-smaller-than-full-indices
>
> When a hash index is created on a populated table, it estimates the
> number of buckets to start out with based on the number of tuples
> returned by estimate_rel_size.  But this number ignores both the fact
> that NULLs are not stored in hash indexes, and that partial indexes
> exist.  This can lead to much too large hash indexes.  Doing a re-index
> just repeats the logic, so doesn't fix anything.  Fill factor also can't
> fix it, as you are not allowed to increase that beyond 100.
>

Hmmm :-(

> This goes back to when the pre-sizing was implemented in 2008
> (c9a1cc694abef737548a2a).  It seems to be an oversight, rather than
> something that was considered.
>
> Is this a bug that should be fixed?  Or if getting a more accurate
> estimate is not possible or not worthwhile, add a code comment about that?
>

I'd agree this smells like a bug (or perhaps two). The sizing probably
should consider both null_frac and selectivity of the index predicate.
When those two are redundant (i.e. when there's IS NOT NULL condition on
indexed column), this will result in under-estimate. That means the
index build will do a an extra split, but that's probably better than
having permanently bloated index.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-08 18:34:17 Re: PostgreSQL vs SQL/XML Standards
Previous Message Jeff Janes 2019-03-08 18:14:11 Hash index initial size is too large given NULLs or partial indexes