Re: More stable query plans via more predictable column statistics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More stable query plans via more predictable column statistics
Date: 2016-04-01 21:14:41
Message-ID: 31801.1459545281@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de> writes:
> Alright. I'm attaching the latest version of this patch split in two
> parts: the first one is NULLs-related bugfix and the second is the
> "improvement" part, which applies on top of the first one.

I've applied the first of these patches, broken into two parts first
because it seemed like there were two issues and second because Tomas
deserved primary credit for one part, ie realizing we were using the
Haas-Stokes formula wrong.

As for the other part, I committed it with one non-cosmetic change:
I do not think it is right to omit "too wide" values when considering
the threshold for MCVs. As submitted, the patch was inconsistent on
that point anyway since it did it differently in compute_distinct_stats
and compute_scalar_stats. But the larger picture here is that we define
the MCV population to exclude nulls, so it's reasonable to consider a
value as an MCV even if it's greatly outnumbered by nulls. There is
no such exclusion for "too wide" values; those things are just an
implementation limitation in analyze.c, not something that is part of
the pg_statistic definition. If there are a lot of "too wide" values
in the sample, we don't know whether any of them are duplicates, but
we do know that the frequencies of the normal-width values have to be
discounted appropriately.

Haven't looked at 0002 yet.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2016-04-01 21:31:20 pg_upgrade 9.6->9.6: column "amtype" does not exist
Previous Message Andres Freund 2016-04-01 20:39:23 Re: Speed up Clog Access by increasing CLOG buffers