Re: [pgsql-hackers] Group-count estimation statistics

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [pgsql-hackers] Group-count estimation statistics
Date: 2005-01-28 20:43:13
Message-ID: 200501281243.13127.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

> The only real solution, of course, is to acquire cross-column
> statistics, but I don't see that happening in the near future.

Y'know, that's been on the todo list for a while. Surely someone is inspired
for 8.1/8.2? At least for columns which are indexed together?

> As a short-term hack, I am thinking that the "clamp to size of table"
> part of the rule is overly pessimistic, and that we should consider
> something like "clamp to size of table / 10" instead.  The justification
> for this is the thought that you aren't going to bother grouping unless
> it actually reduces the data volume.  We have used similar rules in the
> past --- for example, before the logic for trying to estimate actual
> group counts was put in, the estimate for the number of output rows
> from an Agg or Group node was just the number of input rows over 10.

Why 10? I'd think we could come up with a slightly less arbitrary number,
based on "At what point does the median possible cost of estmating too low
equal the median possible cost of estimating too high?" This seems
calculable based on the other information available ...

... although perhaps not without a math PhD. Surely there's one in the house?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2005-01-28 20:49:42 Re: [pgsql-hackers] Allow GRANT/REVOKE permissions to be applied to all schema
Previous Message Andrew Dunstan 2005-01-28 20:27:22 Re: -HEAD on FreeBSD 6-CURRENT build failures