Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: David Gould <daveg(at)sonic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-03 00:18:40
Message-ID: 20180302161840.3530921f@engels
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 01 Mar 2018 18:49:20 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> The sticking point in my mind right now is, if we do that, what to do with
> VACUUM's estimates. If you believe the argument in the PDF that we'll
> necessarily overshoot reltuples in the face of declining true density,
> then it seems like that argument applies to VACUUM as well. However,
> VACUUM has the issue that we should *not* believe that it looked at a
> random sample of pages. Maybe the fact that it looks exactly at the
> changed pages causes it to see something less than the overall density,
> cancelling out the problem, but that seems kinda optimistic.

For what it's worth, I think the current estimate formula for VACUUM is
pretty reasonable. Consider a table T with N rows and P pages clustered
on serial key k. Assume reltuples is initially correct.

Then after:

delete from T where k < 0.2 * (select max k from T);
vacuum T;

Vacuum will touch the first 20% of the pages due to visibility map, the sample
will have 0 live rows, scanned pages will be 0.2 * P.

Then according to the current code:

old_density = old_rel_tuples / old_rel_pages;
new_density = scanned_tuples / scanned_pages;
multiplier = (double) scanned_pages / (double) total_pages;
updated_density = old_density + (new_density - old_density) * multiplier;
return floor(updated_density * total_pages + 0.5);

the new density will be:

N/P + (0/0.2*P - N/P) * 0.2
= N/P - N/P * 0.2
= 0.8 * N/P

New reltuples estimate will be 0.8 * old_reltuples. Which is what we wanted.

If we evenly distribute the deletes across the table:

delete from T where rand() < 0.2;

Then vacuum will scan all the pages, the sample will have 0.8 * N live rows,
scanned pages will be 1.0 * P. The new density will be

N/P + (0.8 * N/1.0*P - N/P) * 1.0
= N/P + (0.8 N/P - N/P)
= N/P - 0.2 * N/P
= 0.8 * N/P

Which again gives new reltuples as 0.8 * old_reltuples and is again correct.

I believe that given a good initial estimate of reltuples and relpages and
assuming that the pages vacuum does not scan do not change density then the
vacuum calculation does the right thing.

However, for ANALYZE the case is different.

-dg

--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-03 00:29:54 Re: JIT compiling with LLVM v11
Previous Message Peter Eisentraut 2018-03-03 00:13:01 Re: JIT compiling with LLVM v11