Quick Links

Re: Group-count estimation statistics

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Group-count estimation statistics
Date:	2005-01-28 19:44:48
Message-ID:	22846.1106941488@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> So why is it any more reasonable for Postgres to assume 0 correlation than any
> other value. Perhaps Postgres should calculate these cases assuming some
> arbitrary level of correlation.

[ shrug... ] Sure, if you want to do the legwork to develop something
credible. But I think I'd still throw in the number-of-rows-over-10
clamp, or something much like it.

> As the total number of records
> goes up the expected number of distinct values should approach the total
> number of records, even if the number of distinct values of each column
> doesn't change.

Well, that's what I thought when I wrote the existing code, but it's
wrong: you don't GROUP BY unique combinations of columns over huge
tables --- or at least, you shouldn't expect great performance if you do.
It'd probably be more reasonable to use a heuristic that expects a
*smaller* fraction of distinct combinations, instead of a larger one,
as the table size goes up.

> There's another possible solution, if Postgres kept statistics on the actual
> results of the query it could later use that feedback to come up with better
> guesses even if it doesn't know *why* they're better.

That's been proposed before but I think it's a blind alley. In most
cases (certainly with anything as complex as a multiply grouped query)
you're not going to be able to derive any trustworthy corrections to
your original statistical estimates. There are too many variables and
their relationships to the end costs are not simple.

regards, tom lane

In response to

Re: Group-count estimation statistics at 2005-01-28 18:29:10 from Greg Stark

Responses

Re: Group-count estimation statistics at 2005-01-28 21:30:00 from Kris Jurka

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matthias Schmidt	2005-01-28 20:17:46	Re: Allow GRANT/REVOKE permissions to be applied to all schema objects with one command
Previous Message	Tom Lane	2005-01-28 18:50:36	Re: -HEAD on FreeBSD 6-CURRENT build failures