Re: What specific circumstances trigger Autovacuum wraparound

From: Daniel Lagerman <spjheruur(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: What specific circumstances trigger Autovacuum wraparound
Date: 2018-06-05 23:57:43
Message-ID: CAAw8gvhNvsfRN15XeN1+OSyZOWatjwtc3m=EAF+euSKCqozdKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello David!

Thanks for taking the time to respond. Upgrading is not in scope right now
for this server even though that is certainly something I will do down the
line.
I talked to a colleague of mine and he dug up something very interesting
and probably answers the behavior I'm seeing, from PG archives. So I
thought I would post it here so that if this question comes up again its
more easy to find it in the future for PG 9.4 specifically.

There is another setting that controls Autovacuum called
autovacuum_multixact_freeze_max_age, in PG 9.5 it is easy to get the
correct value for the table to see if you hit the default limit of 400
million, however in PG 9.4 and earlier the function mxid_age() does not
exist, so you can't query this value directly, hence this reference from
this PG Archive post is crucial:
http://www.postgresql-archive.org/could-not-access-status-of-transaction-pg-multixact-issue-td5822248.html

The thread above reference a bug thread BUG #11264
https://www.postgresql.org/message-id/flat/20140827175105(dot)GI7046%40eldon(dot)alvh(dot)no-ip(dot)org#20140827175105(dot)GI7046(at)eldon(dot)alvh(dot)no-ip(dot)org
And in that thread we can see Alvaro Herrera write the following paragraph:

"A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value."

Given that statement we took a look at this server and the table in
question. We could see that NextMultiXactId: 2640838929 and the relminmxid
for that specific table was 1791982183. Now if you take 2640838929 -
1791982183 = 848856746 you can see that this value is the double of the
default value of 400 million and would explain why vacuum is running:

"As a safety device, a whole-table vacuum scan will occur for any table
whose multixact-age is greater than autovacuum_multixact_freeze_max_age.
Whole-table vacuum scans will also occur progressively for all tables,
starting with those that have the oldest multixact-age, if the amount of
used member storage space exceeds the amount 50% of the addressible storage
space. Both of these kinds of whole-table scans will occur even if
autovacuum is nominally disabled."

This calculation is also confirmed by looking at the git commit of
mxid_age() this is exactly what this function returns and even though the 5
other tables are not over this value we suspect that due to that we are
using more than 50% of the addressable storage space Autovacuum takes to
"opportunity" to run for these tables also.

Now you could of course increase this value also, not sure what the max is,
but I rather want to complete vacuum. So the plan for me now is that during
the weekend, where there is no activity at all except for a backup, is to
increase the work memory from relatively low 1024 MB to use almost all
memory I have on this server, set the server to use one worker and then see
if Autovacuum by itself can solve the cleanup by itself or at least close
the gap so that are confident that we will catch up.

Best regards

Daniel

On Mon, Jun 4, 2018 at 11:47 PM, David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Monday, June 4, 2018, Daniel Lagerman <spjheruur(at)gmail(dot)com> wrote:
>
>> I have a pg 9.4.3 server that has one table with calculate age of about
>> 320 million, this is the oldest table in this database and therefore the
>> age of the DB is the same as the table.
>>
>
> It would be advisable to simply upgrade to 9.4.18 and see if any of the
> various vacuum related bug fixes solves your problem. That's about the
> extent of my useful advice for this...
>
> David J.
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jan Claeys 2018-06-06 00:07:06 Re: Code of Conduct plan
Previous Message Adrian Klaver 2018-06-05 23:51:19 Re: Code of Conduct plan