| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> |
| Cc: | Japin Li <japinli(at)hotmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Report oldest xmin source when autovacuum cannot remove tuples |
| Date: | 2026-06-08 16:35:53 |
| Message-ID: | CAA5RZ0s8FkDPOWYsJj-QAEnw1tmfi12ZvLyUAsFzwcsYCxru_A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>> Additionally, I forgot to update meson.build, which caused the tests
>> to fail. I have fixed that in the attached patch.
> Oops, I made a slight mistake. Fixed.
Thanks for the updated patch!
I agree there is value in logging this information during
vacuum/autovacuum, as unlike a view, it provides the ability to
investigate historically what was blocking vacuum. Having both the
vacuum logging and the view seems like a good combination to me.
In terms of v5, I have some comments:
1/ Reporting the GID of a prepared transaction.
This seems unnecessary. The logged xid is sufficient since a DBA
can look up the GID via pg_prepared_xacts using the "transaction"
column. I think we can remove GetPreparedTransactionGid() and
just report the xid.
2/ Reporting the application_name of the standby.
This could be valuable assuming the user sets application_name in the
primary_conninfo string to make the standby distinguishable. The
documentation [1] says this "should be set" for physical standbys,
and in practice most deployments do so since
synchronous_standby_names matches on it.
However, instead of GetStandbyAppname(), can we use
pgstat_get_beentry_by_proc_number() to get the application name and
set it in dst->name while holding the shared ProcArrayLock? This is
safe if we call pgstat_fetch_stat_numbackends() beforehand to prime
the local backend status cache. The subsequent lookup is just a
bsearch on local memory, so it adds negligible time under the lock.
3/ Over-allocation in GetXidHorizonBlockers().
+ /*
+ * Allocate enough space for every PGPROC plus all replication slots. This
+ * is a generous upper bound (typically only 0-2 entries are returned),
+ * but keeps the logic simple for a diagnostic function that runs
+ * infrequently.
+ */
+ max_blockers = arrayP->maxProcs + max_replication_slots;
+ result = palloc0_array(XidHorizonBlocker, max_blockers);
This looks wasteful, especially with large max_connections. In
practice only 1-3 entries are returned. Can we start with a small
allocation (say 8) and repalloc when needed?
4/ Simplify the loop body by removing candidate_* temporaries.
The values we set in dst don't need to be tracked in separate
variables. If dst gets populated for the proc, we can just do:
```
if (dst)
{
dst->pid = proc->pid;
dst->xid = horizon;
dst->proc_number = pgprocno;
}
```
at the end of each iteration.
5/ GetXidHorizonBlocker() priority scan.
+/*
+ * Get the highest-priority blocker holding back the xid horizon.
+ *
+ * Returns true and stores the blocker in *blocker if any are found.
+ */
The priority scan makes sense and addresses the concern I raised
earlier [2] about reporting the wrong blocker due to ProcArray
ordering. Without it we report whichever blocker happens to come
first in ProcArray order. If an xmin-holder sits at a lower index
than the xid-holder, we would report the less actionable one. The
xid-holder is more useful to surface because killing that single
transaction advances the horizon, whereas xmin-holders only matter
once all of them release.
One small optimization: exit the loop early once we find an xid-match
type, since nothing can beat it:
for (int i = 0; i < nblockers; i++)
{
if (best == NULL || blockers[i].type < best->type)
{
best = &blockers[i];
/* xid-match types can't be beaten, stop early */
if (best->type <= XHB_PREPARED_TRANSACTION)
break;
}
}
Finally, the infrastructure added here (GetXidHorizonBlockers,
XidHorizonBlocker struct, the enum-based priority) will also underpin
the future SQL-callable view for real-time blocker reporting, so I
think what is being done here is good for that case also. I would
also think that GetXidHorizonBlockers() and friends should be a
separate patch, and the vacuum logging and future SQL view work can
build off of it.
[1] https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-PRIMARY-CONNINFO
[2] https://www.postgresql.org/message-id/CAA5RZ0sjMgMo4Xg-niyyF-CpkQ_CK6uOfNKYT%3D9RmiBkAxQkbQ%40mail.gmail.com
--
Sami Imseih
Amazon Web Services (AWS)
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Nazir Bilal Yavuz | 2026-06-08 16:21:42 | Re: Upload only the failed tests logs to the Postgres CI (Cirrus CI) |