Re: Berserk Autovacuum (let's save next Mandrill)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-04-02 12:57:19
Message-ID: 20200402125719.h2js6xlofbx6ovpd@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 01, 2020 at 11:13:12PM -0400, Tom Lane wrote:
>Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> Yeah, that makes sense. I still can't see what might be causing those
>> failures. The tests that were doing an ALTER COLUMN and then expecting
>> to see the results of a non-analysed table ought to be fixed by
>> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
>> 0936d1b6f will help with those anyway, but if so, it'll be annoying
>> not understanding why.
>Quite :-(. While it's too early to declare victory, we've seen no
>more failures of this ilk since 0936d1b6f, so it's sure looking like
>autovacuum did have something to do with it.
>Just to save people repeating the search I did, these are the buildfarm
>failures of interest so far:
>The first of those is unlike the rest, and I'm not trying to account for
>it here. In the rest, what we see is that sometimes the estimates are off
>by a little bit from what's expected, up or down just a percent or two.
>And those deltas kick at inconsistent spots partway through a series of
>similar tests, so it's hard to deny that *something* asynchronous to the
>test script is causing it.
>After contemplating the failures for awhile, I have a theory that
>at least partially matches the data. What I think is happening is
>that autovacuum (NOT auto-analyze) launches on the table, and since
>it is running concurrently with the foreground test script, it fails
>to immediately acquire buffer lock on one or more of the table pages.
>Since this isn't an aggressive vacuum scan, it just politely backs
>off and doesn't scan those pages. And that translates to not getting
>a perfectly accurate reltuples estimate at the end of the vacuum.
>On my x86_64 machine, which matches the buildfarm critters having
>trouble, the actual contents of both of the troublesome tables will
>be 5000 tuples in 31 pages --- which comes out to be 30 pages with
>162 tuples each and then 140 tuples in the last page. Working through
>the math in vac_estimate_reltuples (and assuming that the "old" values
>were accurate numbers from the test script's own ANALYZE), what I find
>is that autovacuum will conclude there are 4999 tuples if it misses
>scanning one of the first 30 pages, or 5021 tuples if it misses scanning
>the last page, because its interpolation from the old tuple density
>figure will underestimate or overestimate the number of missed tuples
>accordingly. Once that slightly-off number gets pushed into pg_class,
>we start to get slightly-off rowcount estimates in the test cases.
>So what I'm hypothesizing is that the pg_statistic data is perfectly
>fine but pg_class.reltuples goes off a little bit after autovacuum.
>The percentage changes in reltuples that I predict this way don't
>quite square with the percentage changes we see in the overall
>rowcount estimates, which is a problem for this theory. But the test
>cases are exercising some fairly complex estimation logic, and it
>wouldn't surprise me much if the estimates aren't linearly affected by
>reltuples. (Tomas, do you want to comment further on that point?)

I think this theory makes perfect sense. I think it's much less likely
to see the last page skipped, so we're likely to end up with reltuples
lower than 5000 (as opposed to seeing the 5021). That kinda matches the
reports, where we generally see estimates reduced by 1 or 2. The -1
change could be explained by rounding errors, I guess - with 5000 we
might have produced 139.51, rounded up to 140, a slight drop may get us
139. Not sure about the -2 changes, but I suppose it's possible we might
actually skip multiple pages, reducing the reltuples estimate even more?


Tomas Vondra
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-02 13:00:45 Re: Collation versioning
Previous Message Julien Rouhaud 2020-04-02 12:48:18 Re: WAL usage calculation patch