Re: [HACKERS] Slow count(*) again...

From: david(at)lang(dot)hm
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-04 00:39:12
Message-ID: alpine.DEB.2.00.1102031637480.30983@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, 3 Feb 2011, Robert Haas wrote:

> On Thu, Feb 3, 2011 at 3:54 PM, <david(at)lang(dot)hm> wrote:
>> with the current code, this is a completely separate process that knows
>> nothing about the load, so if you kick it off when you start the load, it
>> makes a pass over the table (competing for I/O), finishes, you continue to
>> update the table, so it makes another pass, etc. As you say, this is a bad
>> thing to do. I am saying to have an option that ties the two togeather,
>> essentially making the data feed into the Analyze run be a fork of the data
>> comeing out of the insert run going to disk. So the Analyze run doesn't do
>> any I/O and isn't going to complete until the insert is complete. At which
>> time it will have seen one copy of the entire table.
>
> Yeah, but you'll be passing the entire table through this separate
> process that may only need to see 1% of it or less on a large table.
> If you want to write the code and prove it's better than what we have
> now, or some other approach that someone else may implement in the
> meantime, hey, this is an open source project, and I like improvements
> as much as the next guy. But my prediction for what it's worth is
> that the results will suck. :-)

I will point out that 1% of a very large table can still be a lot of disk
I/O that is avoided (especially if it's random I/O that's avoided)

David Lang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mladen Gogala 2011-02-04 00:39:42 Re: [HACKERS] Slow count(*) again...
Previous Message Shaun Thomas 2011-02-04 00:30:50 Re: [HACKERS] Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2011-02-04 00:39:42 Re: [HACKERS] Slow count(*) again...
Previous Message Shaun Thomas 2011-02-04 00:30:50 Re: [HACKERS] Slow count(*) again...