Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: David Gould <daveg(at)sonic(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-01 23:49:20
Message-ID: 3240.1519948160@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> On 01.03.2018 18:09, Tom Lane wrote:
>> Ideally, at least, the estimate would remain on-target.

> The test shows that under this particular scenario the estimated number
> of tuples grows after each ANALYZE. I tried to explain how this happens
> in the attached pdf.

I looked at this and don't think it really answers the question. What
happens is that, precisely because we only slowly adapt our estimate of
density towards the new measurement, we will have an overestimate of
density if the true density is decreasing (even if the new measurement is
spot-on), and that corresponds exactly to an overestimate of reltuples.
No surprise there. The question is why it fails to converge to reality
over time.

I think part of David's point is that because we only allow ANALYZE to
scan a limited number of pages even in a very large table, that creates
an artificial limit on the slew rate of the density estimate; perhaps
what's happening in his tables is that the true density is dropping
faster than that limit allows us to adapt. Still, if there's that
much going on in his tables, you'd think VACUUM would be touching
enough of the table that it would keep the estimate pretty sane.
So I don't think we yet have a convincing explanation of why the
estimates drift worse over time.

Anyway, I find myself semi-persuaded by his argument that we are
already assuming that ANALYZE has taken a random sample of the table,
so why should we not believe its estimate of density too? Aside from
simplicity, that would have the advantage of providing a way out of the
situation when the existing reltuples estimate has gotten drastically off.

The sticking point in my mind right now is, if we do that, what to do with
VACUUM's estimates. If you believe the argument in the PDF that we'll
necessarily overshoot reltuples in the face of declining true density,
then it seems like that argument applies to VACUUM as well. However,
VACUUM has the issue that we should *not* believe that it looked at a
random sample of pages. Maybe the fact that it looks exactly at the
changed pages causes it to see something less than the overall density,
cancelling out the problem, but that seems kinda optimistic.

Anyway, as I mentioned in the 2011 thread, the existing computation is
isomorphic to the rule "use the old density estimate for the pages we did
not look at, and the new density estimate --- ie, exactly scanned_tuples
--- for the pages we did look at". That still has a lot of intuitive
appeal, especially for VACUUM where there's reason to believe those page
populations aren't alike. We could recast the code to look like it's
doing that rather than doing a moving-average, although the outcome
should be the same up to roundoff error.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-03-01 23:52:56 Re: Hash Joins vs. Bloom Filters / take 2
Previous Message Andres Freund 2018-03-01 23:31:30 Re: Hash Joins vs. Bloom Filters / take 2