Re: [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dan McGee <dan(at)archlinux(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
Date: 2012-03-25 15:27:33
Message-ID: CA+TgmoYViEaSck=VjAgtRypoXiePjgQ+Mx4tROZZfYbYKfuA+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 24, 2012 at 12:17 AM, Dan McGee <dan(at)archlinux(dot)org> wrote:
> This is a bit of a corner case in all honesty, but if you have a short
> table (under 20 rows), the 10% heuristic used that decides whether
> distinct values scale with the row count will result in rather odd
> values for stadistinct in pg_statistic, such as '-0.2' or '-0.666667',
> rather than the expected '2'. Additionally, this can cause only one of
> {t, f} to appear in the most common values array.
>
> Does this actually affect query planning in any way? Probably not, but
> it is extremely odd to look at pg_stats for these columns, and the
> solution seems easy.

But the stats aren't there to be looked at, but rather to guide query
planning. If at execution time there are 100 rows in the table,
should we still assume that there are only 2 distinct values in the
table, or that it's gone up to about 50 distinct values? It's hard to
say, but there's no apparent reason to think that the number of
distinct values will scale up for a large table but not a small table.

The bit about maybe not getting both t and f as MCVs on a Boolean does
seem a little worrying, but I'm not sure whether it actually affects
query planning in a materially negative way. Can you demonstrate a
case where it matters?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-25 15:32:04 Re: Gsoc2012 Idea --- Social Network database schema
Previous Message Robert Haas 2012-03-25 15:22:10 Re: PostgreSQL optimisations on Linux machines with more than 24 cores