Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Gould <daveg(at)sonic(dot)net>
Cc: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-06 17:36:43
Message-ID: 28773.1520357803@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Gould <daveg(at)sonic(dot)net> writes:
> On Thu, 01 Mar 2018 18:49:20 -0500
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The sticking point in my mind right now is, if we do that, what to do with
>> VACUUM's estimates.

> For what it's worth, I think the current estimate formula for VACUUM is
> pretty reasonable. Consider a table T with N rows and P pages clustered
> on serial key k. Assume reltuples is initially correct.

If the starting condition involves uniform tuple density throughout the
table, with matching reltuples/relpages ratio, then any set of changes
followed by one VACUUM will produce the right reltuples (to within
roundoff error) at the end. This can be seen by recognizing that VACUUM
will visit every changed page, and the existing calculation is equivalent
to "assume the old tuple density is correct for the unvisited pages, and
then add on the measured tuple count within the visited pages". I'm a bit
inclined to reformulate and redocument the calculation that way, in hopes
that people would find it more convincing.

However, things get less good if the initial state is nonuniform and
we do a set of updates that line up with the nonuniformity. For
example, start with a uniformly full table, and update 50% of the
rows lying within the first 20% of the pages. Now those 20% are
only half full of live tuples, and the table has grown by 10%, with
all those added pages full. Do a VACUUM. It will process the first
20% and the new 10% of pages, and arrive at a correct reltuples count
per the above argument. But now, reltuples/relpages reflects an average
tuple density that's only about 90% of maximum. Next, delete the
surviving tuples in the first 20% of pages, and again VACUUM. VACUUM
will examine only the first 20% of pages, and find that they're devoid
of live tuples. It will then update reltuples using the 90% density
figure as the estimate of what's in the remaining pages, and that's
too small, so that reltuples will drop to about 90% of the correct
value.

Lacking an oracle (small "o"), I do not think there's much we can do
about this, without resorting to very expensive measures such as
scanning the whole table. (It's somewhat interesting to speculate
about whether scanning the table's FSM could yield useful data, but
I'm unsure that I'd trust the results much.) The best we can do is
hope that correlated update patterns like this are uncommon.

Maybe this type of situation is an argument for trusting an ANALYZE-based
estimate more than the VACUUM-based estimate. I remain uncomfortable with
that in cases where VACUUM looked at much more of the table than ANALYZE
did, though. Maybe we need some heuristic based on the number of pages
actually visited by each pass?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-03-06 17:52:21 Re: User defined data types in Logical Replication
Previous Message Alvaro Herrera 2018-03-06 17:25:28 Re: using index or check in ALTER TABLE SET NOT NULL