Re: Improving N-Distinct estimation by ANALYZE

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving N-Distinct estimation by ANALYZE
Date: 2006-01-16 20:05:46
Message-ID: 1137441946.3180.187.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2006-01-16 at 12:26 -0600, Jim C. Nasby wrote:
> On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > >> It's also worth mentioning that for datatypes that only have an "="
> > >> operator the performance of compute_minimal_stats is O(N^2) when values
> > >> are unique, so increasing sample size is a very bad idea in that case.
> >
> > > Hmmm ... does ANALYZE check for UNIQUE constraints?
> >
> > Our only implementation of UNIQUE constraints is btree indexes, which
> > require more than an "=" operator, so this seems irrelevant.
>
> IIRC, the point was that if we know a field has to be unique, there's no
> sense in doing that part of the analysis on it; you'd only care about
> correlation.

An interesting point: if we know the cardinality by definition there is
no need to ANALYZE for that column. An argument in favour of the
definitional rather than the discovery approach. As a designer I very
often already know the cardinality by definition, so I would appreciate
the opportunity to specify that to the database.

Tom has not spoken against checking for UNIQUE constraints: he is just
pointing out that there never could be a constraint in the case I was
identifying. For other datatypes we could check for them rather than
analyzing the sample, but the effect would be the same either way.

My original point was that behaviour is O(N^2) when unique; it is still
O(N^2) when nearly unique, albeit O(N^2) - O(N). Reducing stats_target
does not help there - the effect varies according to number of rows in
the sample, not num slots in MCV list.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Manfred Koizar 2006-01-16 20:24:38 Re: Improving N-Distinct estimation by ANALYZE
Previous Message Bruce Momjian 2006-01-16 19:54:32 Re: [HACKERS] message for constraint