Re: [HACKERS] Slow count(*) again...

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: david(at)lang(dot)hm
Cc: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-03 13:41:42
Message-ID: 20110203134142.GO24931@aart.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Feb 03, 2011 at 02:11:58AM -0800, david(at)lang(dot)hm wrote:
> On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:
>
>> 02.02.11 20:32, Robert Haas ???????(??):
>>> Yeah. Any kind of bulk load into an empty table can be a problem,
>>> even if it's not temporary. When you load a bunch of data and then
>>> immediately plan a query against it, autoanalyze hasn't had a chance
>>> to do its thing yet, so sometimes you get a lousy plan.
>>
>> May be introducing something like 'AutoAnalyze' threshold will help? I
>> mean that any insert/update/delete statement that changes more then x% of
>> table (and no less then y records) must do analyze right after it was
>> finished.
>> Defaults like x=50 y=10000 should be quite good as for me.
>
> If I am understanding things correctly, a full Analyze is going over all
> the data in the table to figure out patterns.
>
> If this is the case, wouldn't it make sense in the situation where you are
> loading an entire table from scratch to run the Analyze as you are
> processing the data? If you don't want to slow down the main thread that's
> inserting the data, you could copy the data to a second thread and do the
> analysis while it's still in RAM rather than having to read it off of disk
> afterwords.
>
> this doesn't make sense for updates to existing databases, but the use case
> of loading a bunch of data and then querying it right away isn't _that_
> uncommon.
>
> David Lang
>

+1 for in-flight ANALYZE. This would be great for bulk loads of
real tables as well as temp tables.

Cheers,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kenneth Marshall 2011-02-03 13:55:58 Re: LIKE, CHAR(), and trailing spaces
Previous Message Thom Brown 2011-02-03 13:32:51 Re: Issues with generate_series using integer boundaries

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Nelson 2011-02-03 14:20:01 Re: [HACKERS] Slow count(*) again...
Previous Message Laszlo Nagy 2011-02-03 11:40:12 Get master-detail relationship metadata