PG12 autovac issues

From: Justin King <kingpin867(at)gmail(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: PG12 autovac issues
Date: 2020-03-17 22:18:57
Message-ID: CAE39h22zPLrkH17GrkDgAYL3kbjvySYD1io+rtnAUFnaJJVS4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

We have a database that isn't overly large (~20G), but gets incredibly
frequent updates. Here's an example table:

feedi=# select * from pg_stat_all_tables where schemaname =
'production' and relname = 'tita';
relid = 16786
schemaname = production
relname = tita
seq_scan = 23365
seq_tup_read = 403862091
idx_scan = 26612759248
idx_tup_fetch = 19415752701
n_tup_ins = 24608806
n_tup_upd = 4207076934
n_tup_del = 24566916
n_tup_hot_upd = 4073821079
n_live_tup = 79942
n_dead_tup = 71969
n_mod_since_analyze = 12020
last_vacuum = 2020-03-17 15:35:19.588859+00
last_autovacuum = 2020-03-17 21:31:08.248598+00
last_analyze = 2020-03-17 15:35:20.372875+00
last_autoanalyze = 2020-03-17 22:04:41.76743+00
vacuum_count = 9
autovacuum_count = 135693
analyze_count = 9
autoanalyze_count = 495877

As you can see in this table, there are only ~80K rows, but billions
of updates. What we have observed is that the frozenxid reaches the
200M mark fairly quickly because of the amount of activity. What is
interesting is that this happens with the 'postgres' and 'template1'
databases as well and there is absolutely no activity in those
databases.

When the 'postgres' and/or 'template1' databases hit the
freeze_max_age, there are cases where it kicks off an aggressive
autovac of those tables which seems to prevent autovacs from running
elsewhere. Oddly, this is not consistent, but that condition seems to
be required. We have observed this across multiple PG12 servers (dev,
test, staging, production) all with similar workloads.

$ grep -i vacuum /var/log/postgresql/postgres.log | cut -b 1-9 | uniq -c
17 Mar 17 06
34 Mar 17 07
31 Mar 17 08
31 Mar 17 09
30 Mar 17 10
34 Mar 17 11
33 Mar 17 12
19 Mar 17 13
40 Mar 17 15
31 Mar 17 16
36 Mar 17 17
34 Mar 17 18
35 Mar 17 19
35 Mar 17 20
33 Mar 17 21

As you can see above, we end up having around ~33 autovac/hr, and
about 13:30 today, they stopped until we ran a "vacuum freeze verbose
analyze;" against the 'postgres' database (around 15:30) which then
caused the autovacs to resume running against the "feedi" database.

I'm completely perplexed as to what is happening and why it suddenly
started when we moved from PG10 > PG12. The configs and workload are
essentially the same between versions. We realize we could simply
increase the autovacuum_freeze_max_age, but that doesn't seem to
actually resolve anything -- it just pushes the problem out. Has
anyone seen anything similar to this?

Thanks very much for the consideration.

Justin King
http://flightaware.com/

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Justin King 2020-03-17 22:22:24 Fwd: PG12 autovac issues
Previous Message John Scalia 2020-03-17 00:38:45 Re: Explain plan on a Select query

Browse pgsql-general by date

  From Date Subject
Next Message Justin King 2020-03-17 22:22:24 Fwd: PG12 autovac issues
Previous Message Jerry Sievers 2020-03-17 22:05:17 Re: Temporary tablespaces on a RAM disk