Re: [HACKERS] Slow count(*) again...

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: david(at)lang(dot)hm
Cc: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-03 15:35:43
Message-ID: AANLkTi=Thet0VZ-X6b7tek+iSLVkE27JjiKrB5D8U0Jd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 5:11 AM, <david(at)lang(dot)hm> wrote:
> If I am understanding things correctly, a full Analyze is going over all the
> data in the table to figure out patterns.

No. It's going over a small, fixed-size sample which depends on
default_statistics_target but NOT on the table size. It's really
important to come up with a solution that's not susceptible to running
ANALYZE over and over again, in many cases unnecessarily.

> If this is the case, wouldn't it make sense in the situation where you are
> loading an entire table from scratch to run the Analyze as you are
> processing the data? If you don't want to slow down the main thread that's
> inserting the data, you could copy the data to a second thread and do the
> analysis while it's still in RAM rather than having to read it off of disk
> afterwords.

Well that's basically what autoanalyze is going to do anyway, if the
table is small enough to fit in shared_buffers. And it's actually
usually BAD if it starts running while you're doing a large bulk load,
because it competes for I/O bandwidth and the buffer cache and slows
things down. Especially when you're bulk loading for a long time and
it tries to run over and over. I'd really like to suppress all those
asynchronous ANALYZE operations and instead do ONE synchronous one at
the end, when we try to use the data.

Of course, the devil is in the nontrivial details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-02-03 15:38:31 Re: out of memory during COPY .. FROM
Previous Message Robert Haas 2011-02-03 15:31:27 Re: [HACKERS] Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2011-02-03 15:43:47 Re: [HACKERS] Slow count(*) again...
Previous Message Robert Haas 2011-02-03 15:31:27 Re: [HACKERS] Slow count(*) again...