Re: estimating # of distinct values

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimating # of distinct values
Date: 2010-12-27 23:04:20
Message-ID: 4D18C7140200002500038BF5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Well, first, those scans occur only once every few hundred million
> transactions, which is not likely a suitable timescale for
> maintaining statistics.

I was assuming that the pass of the entire table was priming for the
incremental updates described at the start of this thread. I'm not
clear on how often the base needs to be updated for the incremental
updates to keep the numbers "close enough".

> And second, we keep on having discussions about rejiggering
> the whole tuple-freezing strategy. Even if piggybacking on those
> scans looked useful, it'd be unwise to assume it'll continue to
> work the same way it does now.

Sure, it might need to trigger its own scan in the face of heavy
deletes anyway, since the original post points out that the
algorithm handles inserts better than deletes, but as long as we
currently have some sequential pass of the data, it seemed sane to
piggyback on it when possible. And maybe we should be considering
things like this when we weigh the pros and cons of rejiggering.
This issue of correlated values comes up pretty often....

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2010-12-28 00:03:47 Re: estimating # of distinct values
Previous Message Tom Lane 2010-12-27 22:55:12 Re: estimating # of distinct values