Quick Links

Re: autovacuum hung on simple tables

From:	Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To:	senor <frio_cervesa(at)hotmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject:	Re: autovacuum hung on simple tables
Date:	2022-11-04 08:50:34
Message-ID:	307bde85482f177cfd0d2bcefc00b61b601a3edc.camel@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Fri, 2022-11-04 at 02:50 +0000, senor wrote:
> I'm still trying to get a better understanding of the autovacuum process.
> This is a different postgres installation as my previous posts and confusing me in new ways.
> Still 11.4 running on CentOS 7 and 8 nvme in software raid

Not good. That should be 11.17. You are missing over two years of important bug fixes,
which could be part of the problem.

> This issue started with postgres "...not accepting commands to avoid wraparound...".

That does not happen normally. Something must have prevented autovacuum from succeeding:

- data corruption that makes VACUUM fail
- a prepared transaction or long running transaction that prevents PostgreSQL from
cleaning up
- a long running session with a temporary table
- autovacuum is too slow to cope with the rate at which dead tuples are generated

> On this server I was able to stop all access to DB and dedicate resources to only postgres.
> I thought I could allow autovacuum to do its thing with a ton of workers.

If autovacuum is too slow, this will take a long time. Note that in v11, the default setting
for "autovacuum_vacuum_cost_delay" is still 20ms, which makes autovacuum pretty slow.
Later, you show that you have "autovacuum_vacuum_cost_delay" set to 0, which is good.
Did you change that recently, so that some workers started before that change?

You should also crank up "maintenance_work_mem" for autovacuum to be fast.

> I think everything boils down to 2 questions:
> 1. Can autovacuum or manual vacuum be coerced into dealing with oldest first?

Manual VACUUM is the easiest. You can start it on the table with the oldest
"relfrozenxid" first.

> 1a. Where might I find advice on configuring postgres resources for maximum cpu &
> memory maintenance use. In other words quickest path out of "not accepting commands"
> land. Besides increasing autovacuum_freeze_max_age.

The documentation has a lot about that.
The quickest way is to run VACUUM on the table with the oldest "relfrozenxid" manually.
I would run VACUUM (VERBOSE), so that you get information if it can clean up or
freeze anything.

> 2. What can cause autovacuum to stall? Could associated toast or index bne the cause.

Potentially data corruption could send a backend into an endless loop.

> It appeared that autovacuum was not choosing the tables with the oldest xmin so I
> produced an ordered list of oldest tables with:
> SELECT oid::regclass, age(relfrozenxid)
> FROM pg_class
> WHERE relkind IN ('r', 't', 'm')
> AND age(relfrozenxid) > 2000000000
> ORDER BY 2 DESC
>
> The list contained over 6000 tables from pg_toast. They all belonged to daily
> reports tables. The reports are created daily and not touched again.
>
> Most of the autovacuums that did start seem to be hung. Never completing even on the simplest tables.
> The newest 2 autovacuums in the list are completing about one every couple seconds.
> CPU and disk IO are nearly idle.

To see if they are hung, look at "wait_event", "wait_event_type" and "state" in the
"pg_stat_activity" rows for the autovacuum workers.

High locks on a table have the potential to block an anti-wraparound autovacuum.
Again, check for log running and prepared transactions.

> I scripted a vacuum loop using the oldest table list. It's extremely slow but it was
> making better progress than autovacuum was.
>
> Using ps I see that there were as many worker processes as defined with autovacuum_max_workers
> but pg_stat_activity consistantly showed 19. I killed the script thinking there might be a conflict.
> I saw no difference after 30 minutes so restarted script.

I am not sure what exactly you are actually doing here, but you should know that there
can only be one VACUUM process per table. If there is already an anti-wraparound autovacuum
running on the table, a manual VACUUM will simple be blocked until the autovacuum worker
is done.

> Never saw anything in pg_stat_progress_vacuum.

Now that would be weird, except if VACUUM cannot get the required lock on the table.

> vacuum settings:
>                 name                 | setting
> -------------------------------------+-----------
> autovacuum                          | on
> autovacuum_freeze_max_age           | 200000000
> autovacuum_max_workers              | 40
> autovacuum_naptime                  | 4
> autovacuum_vacuum_cost_delay        | 0
> autovacuum_vacuum_cost_limit        | 5000
> autovacuum_work_mem                 | -1
> vacuum_freeze_min_age               | 50000000
> vacuum_freeze_table_age             | 150000000
>
> I'm now thinking that autovacuum getting hung up is what caused the issue to begin with. I see nothing
> but the successful vacuums from the script and my own fat-fingering commands in the postgres
> logs (set at info).

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

autovacuum hung on simple tables at 2022-11-04 02:50:26 from senor

Responses

Re: autovacuum hung on simple tables at 2022-11-28 04:05:08 from senor

Browse pgsql-general by date

	From	Date	Subject
Next Message	Laurenz Albe	2022-11-04 08:55:17	Re: Some questions about Postgres
Previous Message	Guillaume Lelarge	2022-11-04 07:18:59	Re: Unable to use pg_verify_checksums