Re: proposal : cross-column stats

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-17 21:41:19
Message-ID: 4D0BD8FF.2070300@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17.12.2010 23:13, Tomas Vondra wrote:
> Dne 17.12.2010 19:58, Robert Haas napsal(a):
>> I haven't read the paper yet (sorry) but just off the top of my head,
>> one possible problem here is that our n_distinct estimates aren't
>> always very accurate, especially for large tables. As we've discussed
>> before, making them accurate requires sampling a significant
>> percentage of the table, whereas all of our other statistics can be
>> computed reasonably accurately by sampling a fixed amount of an
>> arbitrarily large table. So it's possible that relying more heavily
>> on n_distinct could turn out worse overall even if the algorithm is
>> better. Not sure if that's an issue here, just throwing it out
>> there...
>
> Yes, you're right - the paper really is based on (estimates of) number
> of distinct values for each of the columns as well as for the group of
> columns.
>
> AFAIK it will work with reasonably precise estimates, but the point is
> you need an estimate of distinct values of the whole group of columns.
> So when you want to get an estimate for queries on columns (a,b), you
> need the number of distinct value combinations of these two columns.
>
> And I think we're not collecting this right now, so this solution
> requires scanning the table (or some part of it).

Any idea how sensitive it is to the accuracy of that estimate on
distinct value combinations? If we get that off by a factor of ten or a
hundred, what kind of an effect does it have on the final cost estimates?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2010-12-17 21:41:51 Re: proposal : cross-column stats
Previous Message Tom Lane 2010-12-17 21:24:09 Re: proposal : cross-column stats