From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: proposal : cross-column stats |
Date: | 2010-12-17 18:58:02 |
Message-ID: | AANLkTin5h1hLoO_yjxjFiievipYMfL_aVFSRAkYV77_Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 17, 2010 at 12:58 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> In the end, all they need to compute an estimate is number of distinct
> values for each of the columns (we already have that in pg_stats) and a
> number of distinct values for the group of columns in a query. They
> really don't need any multidimensional histogram or something like that.
I haven't read the paper yet (sorry) but just off the top of my head,
one possible problem here is that our n_distinct estimates aren't
always very accurate, especially for large tables. As we've discussed
before, making them accurate requires sampling a significant
percentage of the table, whereas all of our other statistics can be
computed reasonably accurately by sampling a fixed amount of an
arbitrarily large table. So it's possible that relying more heavily
on n_distinct could turn out worse overall even if the algorithm is
better. Not sure if that's an issue here, just throwing it out
there...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Colson | 2010-12-17 18:59:32 | Re: unlogged tables vs. GIST |
Previous Message | Robert Haas | 2010-12-17 18:53:03 | Re: unlogged tables vs. GIST |