Skip site navigation (1) Skip section navigation (2)

Re: lazy_update_relstats considered harmful (was Re: [PERFORM]

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org,Otto Blomqvist <o(dot)blomqvist(at)secomintl(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: lazy_update_relstats considered harmful (was Re: [PERFORM]
Date: 2005-03-25 21:12:27
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-performance
Tom Lane wrote:

>I wrote:
>>One thing that is possibly relevant here is that in 8.0 a plain VACUUM
>>doesn't set reltuples to the exactly correct number, but to an
>>interpolated value that reflects our estimate of the "steady state"
>>average between vacuums.  I wonder if that code is wrong, or if it's
>>operating as designed but is confusing autovac.
>Now that I think it over, I'm thinking that I must have been suffering
>severe brain fade the day I wrote lazy_update_relstats() (see
>vacuumlazy.c).  The numbers that that routine is averaging are the pre-
>and post-vacuum physical tuple counts.  But the difference between them
>consists of known-dead tuples, and we shouldn't be factoring dead tuples
>into reltuples.  The planner has always considered reltuples to count
>only live tuples, and I think this is correct on two grounds:
>1. The numbers of tuples estimated to be returned by scans certainly
>shouldn't count dead ones.
>2. Dead tuples don't have that much influence on scan costs either, at
>least not once they are marked as known-dead.  Certainly they shouldn't
>be charged at full freight.
>It's possible that there'd be some value in adding a column to pg_class
>to record dead tuple count, but given what we have now, the calculation
>in lazy_update_relstats is totally wrong.
>The idea I was trying to capture is that the tuple density is at a
>minimum right after VACUUM, and will increase as free space is filled
>in until the next VACUUM, so that recording the exact tuple count
>underestimates the number of tuples that will be seen on-the-average.
>But I'm not sure that idea really holds water.  The only way that a
>table can be at "steady state" over a long period is if the number of
>live tuples remains roughly constant (ie, inserts balance deletes).
>What actually increases and decreases over a VACUUM cycle is the density
>of *dead* tuples ... but per the above arguments this isn't something
>we should adjust reltuples for.
>So I'm thinking lazy_update_relstats should be ripped out and we should
>go back to recording just the actual stats.
>Sound reasonable?  Or was I right the first time and suffering brain
>fade today?

In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2005-03-25 21:25:09
Subject: Re: Delete query takes exorbitant amount of time
Previous:From: Stephan SzaboDate: 2005-03-25 21:10:51
Subject: Re: Delete query takes exorbitant amount of time

pgsql-hackers by date

Next:From: Lyubomir PetrovDate: 2005-03-25 21:22:36
Subject: Re: Bug 1500
Previous:From: Palle GirgensohnDate: 2005-03-25 21:06:52
Subject: Re: Patch for collation using ICU

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group