Re: [HACKERS] Slow count(*) again...

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Slow count(*) again...
Date: 2011-02-03 14:20:01
Message-ID: AANLkTinUDcJJpHfbRaboKVVgoKPhmTgPebbRZmNAJvZh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 7:41 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> On Thu, Feb 03, 2011 at 02:11:58AM -0800, david(at)lang(dot)hm wrote:
>> On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:
>>
>>> 02.02.11 20:32, Robert Haas ???????(??):
>>>> Yeah.  Any kind of bulk load into an empty table can be a problem,
>>>> even if it's not temporary.  When you load a bunch of data and then
>>>> immediately plan a query against it, autoanalyze hasn't had a chance
>>>> to do its thing yet, so sometimes you get a lousy plan.
>>>
>>> May be introducing something like 'AutoAnalyze' threshold will help? I
>>> mean that any insert/update/delete statement that changes more then x% of
>>> table (and no less then y records) must do analyze right after it was
>>> finished.
>>> Defaults like x=50 y=10000 should be quite good as for me.
>>
>> If I am understanding things correctly, a full Analyze is going over all
>> the data in the table to figure out patterns.
>>
>> If this is the case, wouldn't it make sense in the situation where you are
>> loading an entire table from scratch to run the Analyze as you are
>> processing the data? If you don't want to slow down the main thread that's
>> inserting the data, you could copy the data to a second thread and do the
>> analysis while it's still in RAM rather than having to read it off of disk
>> afterwords.
>>
>> this doesn't make sense for updates to existing databases, but the use case
>> of loading a bunch of data and then querying it right away isn't _that_
>> uncommon.
>>
>> David Lang
>>
>
> +1 for in-flight ANALYZE. This would be great for bulk loads of
> real tables as well as temp tables.

Yes, please, that would be really nice.

--
Jon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Johnston 2011-02-03 14:37:10 Re: Issues with generate_series using integer boundaries
Previous Message Thom Brown 2011-02-03 13:58:04 Re: Issues with generate_series using integer boundaries

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Weilguni 2011-02-03 14:57:57 Re: Which RAID Controllers to pick/avoid?
Previous Message Kenneth Marshall 2011-02-03 13:41:42 Re: [HACKERS] Slow count(*) again...