From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
---|---|
To: | senor <frio_cervesa(at)hotmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | Re: autovacuum hung on simple tables |
Date: | 2022-11-04 08:50:34 |
Message-ID: | 307bde85482f177cfd0d2bcefc00b61b601a3edc.camel@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, 2022-11-04 at 02:50 +0000, senor wrote:
> I'm still trying to get a better understanding of the autovacuum process.
> This is a different postgres installation as my previous posts and confusing me in new ways.
> Still 11.4 running on CentOS 7 and 8 nvme in software raid
Not good. That should be 11.17. You are missing over two years of important bug fixes,
which could be part of the problem.
> This issue started with postgres "...not accepting commands to avoid wraparound...".
That does not happen normally. Something must have prevented autovacuum from succeeding:
- data corruption that makes VACUUM fail
- a prepared transaction or long running transaction that prevents PostgreSQL from
cleaning up
- a long running session with a temporary table
- autovacuum is too slow to cope with the rate at which dead tuples are generated
> On this server I was able to stop all access to DB and dedicate resources to only postgres.
> I thought I could allow autovacuum to do its thing with a ton of workers.
If autovacuum is too slow, this will take a long time. Note that in v11, the default setting
for "autovacuum_vacuum_cost_delay" is still 20ms, which makes autovacuum pretty slow.
Later, you show that you have "autovacuum_vacuum_cost_delay" set to 0, which is good.
Did you change that recently, so that some workers started before that change?
You should also crank up "maintenance_work_mem" for autovacuum to be fast.
> I think everything boils down to 2 questions:
> 1. Can autovacuum or manual vacuum be coerced into dealing with oldest first?
Manual VACUUM is the easiest. You can start it on the table with the oldest
"relfrozenxid" first.
> 1a. Where might I find advice on configuring postgres resources for maximum cpu &
> memory maintenance use. In other words quickest path out of "not accepting commands"
> land. Besides increasing autovacuum_freeze_max_age.
The documentation has a lot about that.
The quickest way is to run VACUUM on the table with the oldest "relfrozenxid" manually.
I would run VACUUM (VERBOSE), so that you get information if it can clean up or
freeze anything.
> 2. What can cause autovacuum to stall? Could associated toast or index bne the cause.
Potentially data corruption could send a backend into an endless loop.
> It appeared that autovacuum was not choosing the tables with the oldest xmin so I
> produced an ordered list of oldest tables with:
> SELECT oid::regclass, age(relfrozenxid)
> FROM pg_class
> WHERE relkind IN ('r', 't', 'm')
> AND age(relfrozenxid) > 2000000000
> ORDER BY 2 DESC
>
> The list contained over 6000 tables from pg_toast. They all belonged to daily
> reports tables. The reports are created daily and not touched again.
>
> Most of the autovacuums that did start seem to be hung. Never completing even on the simplest tables.
> The newest 2 autovacuums in the list are completing about one every couple seconds.
> CPU and disk IO are nearly idle.
To see if they are hung, look at "wait_event", "wait_event_type" and "state" in the
"pg_stat_activity" rows for the autovacuum workers.
High locks on a table have the potential to block an anti-wraparound autovacuum.
Again, check for log running and prepared transactions.
> I scripted a vacuum loop using the oldest table list. It's extremely slow but it was
> making better progress than autovacuum was.
>
> Using ps I see that there were as many worker processes as defined with autovacuum_max_workers
> but pg_stat_activity consistantly showed 19. I killed the script thinking there might be a conflict.
> I saw no difference after 30 minutes so restarted script.
I am not sure what exactly you are actually doing here, but you should know that there
can only be one VACUUM process per table. If there is already an anti-wraparound autovacuum
running on the table, a manual VACUUM will simple be blocked until the autovacuum worker
is done.
> Never saw anything in pg_stat_progress_vacuum.
Now that would be weird, except if VACUUM cannot get the required lock on the table.
> vacuum settings:
> name | setting
> -------------------------------------+-----------
> autovacuum | on
> autovacuum_freeze_max_age | 200000000
> autovacuum_max_workers | 40
> autovacuum_naptime | 4
> autovacuum_vacuum_cost_delay | 0
> autovacuum_vacuum_cost_limit | 5000
> autovacuum_work_mem | -1
> vacuum_freeze_min_age | 50000000
> vacuum_freeze_table_age | 150000000
>
> I'm now thinking that autovacuum getting hung up is what caused the issue to begin with. I see nothing
> but the successful vacuums from the script and my own fat-fingering commands in the postgres
> logs (set at info).
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2022-11-04 08:55:17 | Re: Some questions about Postgres |
Previous Message | Guillaume Lelarge | 2022-11-04 07:18:59 | Re: Unable to use pg_verify_checksums |