Re: VACUUM and ANALYZE disagreeing on what reltuples means

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM and ANALYZE disagreeing on what reltuples means
Date: 2017-09-25 04:50:38
Message-ID: CAJrrPGfTQ3w=5BdprWgQghuM7ujtBy2Sthr6zRwY0QeRBQd2RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 25, 2017 at 4:39 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> On 09/06/2017 09:45 AM, Haribabu Kommi wrote:
> >
> >
> > On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>
> wrote:
> >
> > On 7/25/17 12:55 AM, Tom Lane wrote:
> >
> > Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> > <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> writes:
> >
> > It seems to me that VACUUM and ANALYZE somewhat disagree on
> what
> > exactly reltuples means. VACUUM seems to be thinking that
> > reltuples
> > = live + dead while ANALYZE apparently believes that
> reltuples =
> > live
> >
> >
> > The question is - which of the reltuples definitions is the
> > right
> > one? I've always assumed that "reltuples = live + dead" but
> > perhaps
> > not?
> >
> >
> > I think the planner basically assumes that reltuples is the live
> > tuple count, so maybe we'd better change VACUUM to get in step.
> >
> >
> > Attached is a patch that (I think) does just that. The disagreement
> > was caused by VACUUM treating recently dead tuples as live, while
> > ANALYZE treats both of those as dead.
> >
> > At first I was worried that this will negatively affect plans in the
> > long-running transaction, as it will get underestimates (due to
> > reltuples not including rows it can see). But that's a problem we
> > already have anyway, you just need to run ANALYZE in the other
> session.
> >
> >
> > Thanks for the patch.
> > From the mail, I understand that this patch tries to improve the
> > reltuples value update in the catalog table by the vacuum command
> > to consider the proper visible tuples similar like analyze command.
> >
> > -num_tuples);
> > +num_tuples - nkeep);
> >
> > With the above correction, there is a problem in reporting the number
> > of live tuples to the stats.
> >
> > postgres=# select reltuples, n_live_tup, n_dead_tup
> > from pg_stat_user_tables join pg_class using (relname)
> > where relname = 't';
> > reltuples | n_live_tup | n_dead_tup
> > -----------+------------+------------
> > 899818 | 799636 | 100182
> > (1 row)
> >
> >
> > The live tuples data value is again decremented with dead tuples
> > value before sending them to stats in function lazy_vacuum_rel(),
> >
> > /* report results to the stats collector, too */
> > new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;
> >
> > The fix needs a correction here also. Or change the correction in
> > lazy_vacuum_rel() function itself before updating catalog table similar
> > like stats.
> >
>
> Ah, haven't noticed that for some reason - you're right, we estimate the
> reltuples based on (num_tuples - nkeep), so it doesn't make much sense
> to subtract nkeep again. Attached is v2 of the fix.
>
> I've removed the subtraction from lazy_vacuum_rel(), leaving just
>
> new_live_tuples = new_rel_tuples;
>
> and now it behaves as expected (no second subtraction). That means we
> can get rid of new_live_tuples altogether (and the protection against
> negative values), and use new_rel_tuples directly.
>
> Which pretty much means that in this case
>
> (pg_class.reltuples == pg_stat_user_tables.n_live_tup)
>
> but I guess that's fine, based on the initial discussion in this thread.

The changes are fine and now it reports proper live tuples in both
pg_class and stats. The other issue of continuous vacuum operation
leading to decrease of number of live tuples is not related to this
patch and that can be handled separately.

I changed the patch status as ready for committer.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-25 05:12:49 Re: Rethinking autovacuum.c memory handling
Previous Message Michael Paquier 2017-09-25 04:43:46 Re: Setting pd_lower in GIN metapage