Re: Berserk Autovacuum (let's save next Mandrill)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-04-02 03:13:12
Message-ID: 22297.1585797192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> Yeah, that makes sense. I still can't see what might be causing those
> failures. The tests that were doing an ALTER COLUMN and then expecting
> to see the results of a non-analysed table ought to be fixed by
> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
> 0936d1b6f will help with those anyway, but if so, it'll be annoying
> not understanding why.

Quite :-(. While it's too early to declare victory, we've seen no
more failures of this ilk since 0936d1b6f, so it's sure looking like
autovacuum did have something to do with it.

Just to save people repeating the search I did, these are the buildfarm
failures of interest so far:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35

The first of those is unlike the rest, and I'm not trying to account for
it here. In the rest, what we see is that sometimes the estimates are off
by a little bit from what's expected, up or down just a percent or two.
And those deltas kick at inconsistent spots partway through a series of
similar tests, so it's hard to deny that *something* asynchronous to the
test script is causing it.

After contemplating the failures for awhile, I have a theory that
at least partially matches the data. What I think is happening is
that autovacuum (NOT auto-analyze) launches on the table, and since
it is running concurrently with the foreground test script, it fails
to immediately acquire buffer lock on one or more of the table pages.
Since this isn't an aggressive vacuum scan, it just politely backs
off and doesn't scan those pages. And that translates to not getting
a perfectly accurate reltuples estimate at the end of the vacuum.
On my x86_64 machine, which matches the buildfarm critters having
trouble, the actual contents of both of the troublesome tables will
be 5000 tuples in 31 pages --- which comes out to be 30 pages with
162 tuples each and then 140 tuples in the last page. Working through
the math in vac_estimate_reltuples (and assuming that the "old" values
were accurate numbers from the test script's own ANALYZE), what I find
is that autovacuum will conclude there are 4999 tuples if it misses
scanning one of the first 30 pages, or 5021 tuples if it misses scanning
the last page, because its interpolation from the old tuple density
figure will underestimate or overestimate the number of missed tuples
accordingly. Once that slightly-off number gets pushed into pg_class,
we start to get slightly-off rowcount estimates in the test cases.

So what I'm hypothesizing is that the pg_statistic data is perfectly
fine but pg_class.reltuples goes off a little bit after autovacuum.
The percentage changes in reltuples that I predict this way don't
quite square with the percentage changes we see in the overall
rowcount estimates, which is a problem for this theory. But the test
cases are exercising some fairly complex estimation logic, and it
wouldn't surprise me much if the estimates aren't linearly affected by
reltuples. (Tomas, do you want to comment further on that point?)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-04-02 03:21:34 Re: Should we add xid_current() or a int8->xid cast?
Previous Message Etsuro Fujita 2020-04-02 03:08:57 Re: [HACKERS] advanced partition matching algorithm for partition-wise join