On Oct 12, 2012, at 22:13, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> [ shrug... ] You're attacking a straw man, or more precisely putting
>> words into my mouth about what the percentage-based thresholds might be.
>> Notice the examples I gave involved update percentages quite far north
>> of 100%. It's possible and maybe likely that we need a sliding scale.
> I was just discussing such a sliding scale approach w/ Josh on IRC, my
> thinking was that we could use a logarithmic approach based on table
>> Also, I don't necessarily accept the conclusion you seem to be drawing,
>> that it's okay to have complete turnover of a small table and not redo
>> its stats. If you don't like the current behavior when there's no
>> stats, why would you like the behavior when there are some stats but
>> they no longer have the remotest relationship to reality?
> Josh's concern is about autovacuum causing lots of stats churn, which is
> understandable, we don't want it constantly rescanning a table, but
> perhaps we could use some kind of threshold for preventing autovac from
> rescanning a table it just scanned? Note that I did *not* say 'GUC',
> but I don't know what the 'right' answer is for how frequently is
> good-but-not-too-frequent. I'd also like to try and avoid adding GUCs.
Instead of global could you attach an interface function to the table and have the auto-analyzer call that function to basically ask the table whether it needs to be analyzed? Still need to deal with defaults and provide a decent supply of built-in algorithms but at least the system can be made tunable. The default algorithm could maybe just handoff to a table size specific handler. The create table and alter table commands could be used to change the assigned algorithm if desired and new ones could be supplied via extensions.
The 1000 row default seems unusual at first glance and contributes to the problem described.
It is likely that the first I sent following the create table is going to be a bulk load if the table is going to have many rows. In the case where rows are inserted individually it is likely that the expected row count will be closer to 1 than 1000.
One useful algorithm to provide the user is analyze on insert and, though maybe less so, analyze on update. So that any insert/update causes the table to be re-analyzed. Not a good default but, combined with "delayed analyze" logic to establish a minimum frequency, is a possible option for some use cases.
Temporary table creation should have special attention given if changes are going to be made here.
Another idea is to have system after [command] trigger(s) than can be used to call analyze without waiting for the auto-vacuum process. Provide some way for CREATE/ALTER TABLE and maybe auto-vacuum to enable and disable the trigger.
In response to
pgsql-hackers by date
|Next:||From: Michael Paquier||Date: 2012-10-13 02:50:44|
|Subject: Re: Support for REINDEX CONCURRENTLY|
|Previous:||From: Stephen Frost||Date: 2012-10-13 02:13:39|
|Subject: Re: Potential autovacuum optimization: new tables|