Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
Date: 2017-11-18 21:56:33
Message-ID: 27720.1511042193@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On 11/02/2017 08:15 PM, Tom Lane wrote:
>> However, I'm not sure we're there yet, because there remains a fairly
>> nasty discrepancy even once we've gotten everyone onto the same page
>> about reltuples counting just live tuples: VACUUM and ANALYZE have
>> different definitions of what's "live". In particular they do not treat
>> INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples the same. Should we
>> try to do something about that? If so, what? It looks like ANALYZE's
>> behavior is pretty tightly constrained, judging by the comments in
>> acquire_sample_rows.

> ISTM we need to unify those definitions, probably so that VACUUM adopts
> what acquire_sample_rows does. I mean, if ANALYZE assumes that the stats
> will be updated at the end of transaction, why shouldn't VACUUM do the
> same thing?

That was the way I was leaning. I haven't thought very hard about the
implications, but as long as the change in VACUUM's behavior extends
only to the live-tuple count it reports, it seems like adjusting it
couldn't break anything too badly.

>> Another problem is that it looks like CREATE INDEX will set reltuples
>> to the total number of heap entries it chose to index, because that
>> is what IndexBuildHeapScan counts. Maybe we should adjust that?

> You mean by only counting live tuples in IndexBuildHeapRangeScan,
> following whatever definition we end up using in VACUUM/ANALYZE?

Right. One issue is that, as I mentioned, the index AMs probably want to
think about total-tuples-indexed not live-tuples; so for their purposes,
what IndexBuildHeapScan currently counts is the right thing. We need to
look and see if any AMs are actually using that value rather than just
silently passing it back. If they are, we might need to go to the trouble
of computing/returning two values.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-11-18 22:05:47 Re: percentile value check can be slow
Previous Message Tomas Vondra 2017-11-18 21:40:58 Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means