Re: lazy_update_relstats considered harmful (was Re: [PERFORM]

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Otto Blomqvist <o(dot)blomqvist(at)secomintl(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: lazy_update_relstats considered harmful (was Re: [PERFORM]
Date: 2005-03-25 21:12:27
Message-ID: 42447EBB.20905@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Tom Lane wrote:

>I wrote:
>
>
>>One thing that is possibly relevant here is that in 8.0 a plain VACUUM
>>doesn't set reltuples to the exactly correct number, but to an
>>interpolated value that reflects our estimate of the "steady state"
>>average between vacuums. I wonder if that code is wrong, or if it's
>>operating as designed but is confusing autovac.
>>
>>
>
>Now that I think it over, I'm thinking that I must have been suffering
>severe brain fade the day I wrote lazy_update_relstats() (see
>vacuumlazy.c). The numbers that that routine is averaging are the pre-
>and post-vacuum physical tuple counts. But the difference between them
>consists of known-dead tuples, and we shouldn't be factoring dead tuples
>into reltuples. The planner has always considered reltuples to count
>only live tuples, and I think this is correct on two grounds:
>
>1. The numbers of tuples estimated to be returned by scans certainly
>shouldn't count dead ones.
>
>2. Dead tuples don't have that much influence on scan costs either, at
>least not once they are marked as known-dead. Certainly they shouldn't
>be charged at full freight.
>
>It's possible that there'd be some value in adding a column to pg_class
>to record dead tuple count, but given what we have now, the calculation
>in lazy_update_relstats is totally wrong.
>
>The idea I was trying to capture is that the tuple density is at a
>minimum right after VACUUM, and will increase as free space is filled
>in until the next VACUUM, so that recording the exact tuple count
>underestimates the number of tuples that will be seen on-the-average.
>But I'm not sure that idea really holds water. The only way that a
>table can be at "steady state" over a long period is if the number of
>live tuples remains roughly constant (ie, inserts balance deletes).
>What actually increases and decreases over a VACUUM cycle is the density
>of *dead* tuples ... but per the above arguments this isn't something
>we should adjust reltuples for.
>
>So I'm thinking lazy_update_relstats should be ripped out and we should
>go back to recording just the actual stats.
>
>Sound reasonable? Or was I right the first time and suffering brain
>fade today?
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lyubomir Petrov 2005-03-25 21:22:36 Re: Bug 1500
Previous Message Palle Girgensohn 2005-03-25 21:06:52 Re: Patch for collation using ICU

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-03-25 21:25:09 Re: Delete query takes exorbitant amount of time
Previous Message Stephan Szabo 2005-03-25 21:10:51 Re: Delete query takes exorbitant amount of time