Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Gould <daveg(at)sonic(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-04 15:49:46
Message-ID: CAMkU=1yQz4OEmoWSnY_wE03KdpoQ4zpzmjpyPDUQwz+tbsU21g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2018 at 4:49 PM, David Gould <daveg(at)sonic(dot)net> wrote:

>
> Please add the attached patch and this discussion to the open commit fest.
> The
> original bugs thread is here: 20180111111254(dot)1408(dot)8342(at)wrigl
> eys.postgresql.org.
>
> Bug reference: 15005
> Logged by: David Gould
> Email address: daveg(at)sonic(dot)net
> PostgreSQL version: 10.1 and earlier
> Operating system: Linux
> Description:
>
> ANALYZE can make pg_class.reltuples wildly inaccurate compared to the
> actual
> row counts for tables that are larger than the default_statistics_target.
>
> Example from one of a clients production instances:
>
> # analyze verbose pg_attribute;
> INFO: analyzing "pg_catalog.pg_attribute"
> INFO: "pg_attribute": scanned 30000 of 24519424 pages, containing 6475
> live rows and 83 dead rows; 6475 rows in sample, 800983035 estimated total
> rows.
>
> This is a large complex database - pg_attribute actually has about five
> million rows and needs about one hundred thouand pages. However it has
> become extremely bloated and is taking 25 million pages (192GB!), about 250
> times too much. This happened despite aggressive autovacuum settings and a
> periodic bloat monitoring script. Since pg_class.reltuples was 800 million,
> the bloat monitoring script did not detect that this table was bloated and
> autovacuum did not think it needed vacuuming.
>

I can see how this issue would prevent ANALYZE from fixing the problem, but
I don't see how it could have caused the problem in the first place. In
your demonstration case, you had to turn off autovac in order to get it to
happen, and then when autovac is turned back on, it is all primed for an
autovac to launch, go through, touch almost all of the pages, and fix it
for you. How did your original table get into a state where this wouldn't
happen?

Maybe a well-timed crash caused n_dead_tup to get reset to zero and that is
why autovac is not kicking in? What are the pg_stat_user_table number and
the state of the visibility map for your massively bloated table, if you
still have them?

In any event, I agree with your analysis that ANALYZE should set the number
of tuples from scratch. After all, it sets the other estimates, such as
MCV, from scratch, and those are much more fragile to sampling than just
the raw number of tuples are. But if the default target is set to 1, that
would scan only 300 pages. I think that that is a little low of a sample
size to base an estimate on, but it isn't clear to that using 300 pages
plus whacking them around with an exponential averaging is really going to
be much better. And if you set your default target to 1, that is
more-or-less what you signed up for.

It is little weird to have VACUUM incrementally update and then ANALYZE
compute from scratch and discard the previous value, but no weirder than
what we currently do of having ANALYZE incrementally update despite that it
is specifically designed to representatively sample the entire table. So I
don't think we need to decide what to do about VACUUM before we can do
something about ANALYZE.

So I support your patch. There is probably more investigation and work
that could be done in this area, but those could be different patches, not
blocking this one.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-04 15:53:53 Re: Fwd: automatic disable unicode line style when terminal is not unicode
Previous Message Pavel Stehule 2018-03-04 15:34:54 Fwd: automatic disable unicode line style when terminal is not unicode