Re: Does auto-analyze work on dirty writes?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Conor Walsh <ctw(at)adverb(dot)ly>, jd(at)commandprompt(dot)com, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Does auto-analyze work on dirty writes?
Date: 2011-02-10 17:51:31
Message-ID: AANLkTimeTk_reda4x3WwsgOfXN6NO6Be63YyjJ5UnZLz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 8:50 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> On 02/04/2011 10:41 AM, Tom Lane wrote:
>>
>> 1. Autovacuum fires when the stats collector's insert/update/delete
>> counts have reached appropriate thresholds.  Those counts are
>> accumulated from messages sent by backends at transaction commit or
>> rollback, so they take no account of what's been done by transactions
>> still in progress.
>>
>> 2. Only live rows are included in the stats computed by ANALYZE.
>> (IIRC it uses SnapshotNow to decide whether rows are live.)
>>
>> Although the stats collector does track an estimate of the number of
>> dead rows for the benefit of autovacuum, this isn't used by planning.
>> Table bloat is accounted for only in terms of growth of the physical
>> size of the table in blocks.
>
> Thanks, Tom.
>
> Does this un-analyzed "bloat" not impact queries? I guess the worst case
> here is if autovaccum is disabled for some reason and 99% of the table is
> dead rows. If I understand the above correctly, I think analyze might
> generate a bad plan under this scenario, thinking that a value is unique,
> using the index - but every tuple in the index has the same value and each
> has to be looked up in the table to see if it is visible?

It sounds like you're describing something like a one-row table with a
unique index on one of its column, getting updates that can't be made
HOT, and not getting vacuumed. That scenario does suck - I had a test
case I was using it a while back that generated something similar -
but I'm not sure how much it's worth worrying about the plan, because
either an index scan or a sequential scan is going to be awful.

To put that another way, I've founded that the optimizer copes pretty
well with adjusting plans as tables get bloated - mostly by using
index scans rather than sequential scans. It's possible there is some
improvement still to be had there, but I would be a lot more
interested in fixing the bloat, at least based on my own experiences.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2011-02-10 17:56:10 Re: Why we don't want hints Was: Slow count(*) again...
Previous Message Kevin Grittner 2011-02-10 17:44:29 Re: Why we don't want hints Was: Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-02-10 17:56:10 Re: Why we don't want hints Was: Slow count(*) again...
Previous Message Kevin Grittner 2011-02-10 17:44:29 Re: Why we don't want hints Was: Slow count(*) again...