Re: estimating # of distinct values

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimating # of distinct values
Date: 2010-12-31 13:34:04
Message-ID: 1293802260-sup-5579@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > I was thinking that we could have two different ANALYZE modes, one
> > "full" and one "incremental"; autovacuum could be modified to use one or
> > the other depending on how many changes there are (of course, the user
> > could request one or the other, too; not sure what should be the default
> > behavior).
>
> How is an incremental ANALYZE going to work at all? It has no way to
> find out the recent changes in the table, for *either* inserts or
> deletes. Unless you want to seqscan the whole table looking for tuples
> with xmin later than something-or-other ... which more or less defeats
> the purpose.

Yeah, I was thinking that this incremental ANALYZE would be the stream
in the "stream-based estimator" but evidently it doesn't work that way.
The stream that needs to be passed to the estimator consists of new
tuples as they are being inserted into the table, so this would need to
be done by the inserter process ... or it'd need to transmit the CTIDs
for someone else to stream them ... not an easy thing, in itself.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-12-31 13:35:25 Re: and it's not a bunny rabbit, either
Previous Message Alvaro Herrera 2010-12-31 13:28:36 Re: Snapshot synchronization, again...