Re: Report oldest xmin source when autovacuum cannot remove tuples

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>
Cc: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Sami Imseih <samimseih(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report oldest xmin source when autovacuum cannot remove tuples
Date: 2026-06-29 15:22:50
Message-ID: CALj2ACVPbn=X_fGZbUMwzvrbT-ZzHnr2=qA_ykjiyH8cnp1PFg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Jun 27, 2026 at 1:08 AM Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> wrote:
>
> I have already posted the version addressing the previous review
> comments. On top of that, v7 includes several further fixes I found
> while re-examining the code myself, which are summarized below.

Thanks for working on this, and sorry for being late. A big +1 to
reporting what's holding the vacuum back. Today one has to query a
bunch of different views, and it's not always obvious which ones to
look at, so reporting the blocker reason in the logs (perhaps also in
the progress report) is very useful IMO. It helps quickly diagnose the
root cause, possibly act on it, and also helps with post-hoc analysis.

My main concern is the additional proc array scan and the lock it
takes. ProcArrayLock is one of the most contended locks in production
systems with hundreds or thousands of connections. With many workers
vacuuming concurrently across hundreds of tables, I expect this extra
lock acquisition to show up in the contention path.
ComputeXidHorizons() already takes ProcArrayLock and has all the
information about who is blocking the vacuum, so I think we can
capture the blocker right there and get the reason almost for free,
without a second scan.

One thing worth mentioning is that we can only ever report a
"possible" reason. There's an inherent race - by the time we report a
blocker, it may already be gone. So this is best-effort information.

The other concern is timing. The patch reports the blocker after the
vacuum has finished. With hundreds of GB of tables and indexes, where
vacuum can run for hours, reporting up front is far more useful than
learning at the end that vacuum couldn't remove the tuples. If we
capture the blocker when ComputeXidHorizons() computes the horizon, we
can tell the user early, while there's still a chance to act on it.

I wrote a patch on the same idea about a month ago, then got busy and
only noticed this thread now. If it helps, I'm happy to post it.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-06-29 15:31:03 Re: Report oldest xmin source when autovacuum cannot remove tuples
Previous Message Tom Lane 2026-06-29 15:17:08 Re: PostgreSQL select-only CTE removal is too aggressive?