Re: [HACKERS] Slow count(*) again...

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: david(at)lang(dot)hm
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-03 22:35:12
Message-ID: 4D4B2DA0.3040601@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

david(at)lang(dot)hm wrote:
> I am making the assumption that an Analyze run only has to go over the
> data once (a seqential scan of the table if it's >> ram for example)
> and gathers stats as it goes.

And that's the part there's some confusion about here. ANALYZE grabs a
random set of samples from the table, the number of which is guided by
the setting for default_statistics_target. The amount of time it takes
is not proportional to the table size; it's only proportional to the
sampling size. Adding a process whose overhead is proportional to the
table size, such as the continuous analyze idea you're proposing, is
quite likely to be a big step backwards relative to just running a
single ANALYZE after the loading is finished.

What people should be doing if concerned about multiple passes happening
is something like this:

CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);

I'm not optimistic the database will ever get smart enough to recognize
bulk loading and do this sort of thing automatically, but as the
workaround is so simple it's hard to get motivated to work on trying.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mladen Gogala 2011-02-03 22:39:06 Re: [HACKERS] Slow count(*) again...
Previous Message Michael Glaesemann 2011-02-03 22:19:04 Re: [HACKERS] Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2011-02-03 22:39:06 Re: [HACKERS] Slow count(*) again...
Previous Message Michael Glaesemann 2011-02-03 22:19:04 Re: [HACKERS] Slow count(*) again...