From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Justin King <kingpin867(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org, michael(at)paquier(dot)xyz, kgrittn(at)gmail(dot)com |
Subject: | Re: PG12 autovac issues |
Date: | 2020-03-23 15:22:47 |
Message-ID: | 20200323152247.GB52612@nol |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-general |
On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote:
> Hi,
>
> On 2020-03-20 12:42:31 -0500, Justin King wrote:
> > When we get into this state again, is there some other information
> > (other than what is in pg_stat_statement or pg_stat_activity) that
> > would be useful for folks here to help understand what is going on?
>
> If it's actually stuck on a single table, and that table is not large,
> it would be useful to get a backtrace with gdb.
FTR, we're facing a very similar issue at work (adding Michael and Kevin in Cc)
during performance tests since a recent upgrade to pg12 .
What seems to be happening is that after reaching 200M transaction a first pass
of autovacuum freeze is being run, bumping pg_database.darfrozenxid by ~ 800k
(age(datfrozenxid) still being more than autovacuum_freeze_max_age afterwards).
After that point, all available information seems to indicate that no
autovacuum worker is scheduled anymore:
- log_autovacuum_min_duration is set to 0 and no activity is logged (while
having thousands of those per hour before that)
- 15 min interval snapshot of pg_database and pg_class shows that
datfrozenxid/relfrozenxid keeps increasing at a consistent rate and never
goes down
- 15 min interval snapshot of pg_stat_activity doesn't show any autovacuum
worker
- the autovacuum launcher is up and running and doesn't show any sign of
problem
- n_mod_since_analyze keeps growing at a consistent rate, never going down
- 15 min delta of tup_updated and tup_deleted shows that the globate write
activity doesn't change before and after the autovacuum problem
The situation continues for ~2h, at which point the bloat is so heavy that the
main filesystem becomes full, and postgres panics after a failed write in
pg_logical directory or similar.
The same bench was run against pg11 many times and never triggered this issue.
So far our best guess is a side effect of 2aa6e331ead7.
Michael and I have been trying to reproduce this issue locally (drastically
reducing the various freeze_age parameters) for hours, but no luck for now.
This is using a vanilla pg 12.1, with some OLTP workload. The only possibly
relevant configuration changes are quite aggressive autovacuum settings on some
tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum scale
factor to 0, for both heap and toast).
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2020-03-23 16:23:03 | Re: PG12 autovac issues |
Previous Message | Laurenz Albe | 2020-03-23 13:35:48 | Re: Append only tables |
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2020-03-23 15:31:36 | Re: Loading 500m json files to database |
Previous Message | Rob Sargent | 2020-03-23 15:16:40 | Re: Loading 500m json files to database |