Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: jiye <jiye_sw(at)126(dot)com>
Cc: "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: confusion about this commit "Revert "Skip redundant anti-wraparound vacuums""
Date: 2023-06-06 19:30:02
Message-ID: CA+TgmoYPjEqZ7bi-oOQq=qRCw_Z+nxvwZSR6+Z=Xe60-NLTF=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 5, 2023 at 1:50 AM jiye <jiye_sw(at)126(dot)com> wrote:

> we can not get determinate test case as this issue reproduce only once,
> and currently autovaccum can works as we using vacuum freeze for each
> tables of each database.
>
> our client's application is real online bank business, and have serveral
> customer database, do a majority of update opertaion as result trigger
> some table dead_tup_ratio nealy 100%, but can not find any autovacuum
> process work for a very long time before we do vacuum freeze manally.
>

I tend to doubt that this is caused by the commit you're blaming, because
that commit purports to skip autovacuum operations only if some other
vacuum has already done the work. Here you are saying that you see no
autovacuum tasks at all.

The screenshot that you posted of XID ages exceeding 200 million is not
evidence of a problem. It's pretty normal for some table XID ages
to temporarily exceed autovacuum_freeze_max_age, especially if you have a
lot of tables with about the same XID age, as seems to be the case here.
When a table's XID age reaches autovacuum_freeze_max_age, the system will
start trying harder to reduce the XID age, but that process isn't
instantaneous.

On the other hand, your statement that you have very high numbers of dead
tuples *is* evidence of a problem. It's very likely caused by vacuum not
running aggressively enough. Remember that autovacuum is limited by the
number of workers (autovacuum_max_workers) but even more importantly by the
cost delay system. It's *extremely* common to need to raise
vacuum_cost_limit on large or busy database systems, often by large
multiples (e.g. 10x or more).

I'd strongly suggest that you carefully monitor how many autovacuum
processes are running and what they are doing. If I were a betting man, I'd
bet that you'd find that in the situation where you had this problem, the
number of running processes was always 3 -- which is the configured maximum
-- and if you looked at the wait event in pg_stat_activity I bet you would
see VacuumDelay showing up a lot. If so, raise vacuum_cost_limit
considerably and over time the problem should get better. It won't be
instantaneous.

Or maybe I'm wrong and you'd see something else, but whatever you did see
would probably give a hint as to what the problem here is.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-06-06 19:37:17 Re: Order changes in PG16 since ICU introduction
Previous Message Joe Conway 2023-06-06 19:27:19 Re: pg_collation.collversion for C.UTF-8