Skip site navigation (1) Skip section navigation (2)

Re: estimating # of distinct values

From: Jim Nasby <jim(at)nasby(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: estimating # of distinct values
Date: 2011-01-01 03:24:26
Message-ID: AD87588E-8BCE-403C-B8F7-871C9182157F@nasby.net (view raw or flat)
Thread:
Lists: pgsql-hackers
On Dec 31, 2010, at 7:34 AM, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> I was thinking that we could have two different ANALYZE modes, one
>>> "full" and one "incremental"; autovacuum could be modified to use one or
>>> the other depending on how many changes there are (of course, the user
>>> could request one or the other, too; not sure what should be the default
>>> behavior).
>> 
>> How is an incremental ANALYZE going to work at all?  It has no way to
>> find out the recent changes in the table, for *either* inserts or
>> deletes.  Unless you want to seqscan the whole table looking for tuples
>> with xmin later than something-or-other ... which more or less defeats
>> the purpose.
> 
> Yeah, I was thinking that this incremental ANALYZE would be the stream
> in the "stream-based estimator" but evidently it doesn't work that way.
> The stream that needs to be passed to the estimator consists of new
> tuples as they are being inserted into the table, so this would need to
> be done by the inserter process ... or it'd need to transmit the CTIDs
> for someone else to stream them ... not an easy thing, in itself.

Perhaps listen/notify could be used for this, now that it allows passing a payload.

BTW, if we reduce the frequency at which full scans of large tables are needed then presumably the cost of the scans could be largely ignored. If we don't need to scan frequently then we shouldn't care very much about how long a scan takes, which means we could throttle it heavily. Presumably even a heavily used system can spare 500kB/s of IO to perform background scanning...
--
Jim C. Nasby, Database Architect                   jim(at)nasby(dot)net
512.569.9461 (cell)                         http://jim.nasby.net



In response to

pgsql-hackers by date

Next:From: Robert HaasDate: 2011-01-01 05:05:30
Subject: Re: and it's not a bunny rabbit, either
Previous:From: Josh BerkusDate: 2011-01-01 00:20:54
Subject: Re: Sync Rep Design

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group