| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Add pg_stat_recovery system view |
| Date: | 2026-03-06 15:49:41 |
| Message-ID: | CABPTF7UyUWT0E4_vxzfxPkHy=hqjrO1wcKO_ZCLBq9+en_UpQQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
Thanks for looking into this.
On Fri, Mar 6, 2026 at 3:32 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
> > On Mar 5, 2026, at 21:01, Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > On Thu, Mar 5, 2026 at 3:37 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >>
> >> Hi Michael,
> >>
> >> Thanks for the detailed feedback!
> >>
> >> On Thu, Mar 5, 2026 at 11:58 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >>>
> >>> On Wed, Mar 04, 2026 at 09:02:29AM +0800, Xuneng Zhou wrote:
> >>>> Just rebase.
> >>>
> >>> I have applied 0001, that simply moves some code around.
> >>>
> >>
> >> Thanks for applying.
> >>
> >>> Regarding 0002, I would recommend to not bump the catalog version in
> >>> catversion.h when sending a patch to the lists. This task is left to
> >>> committers when the code gets merged into the tree. And this is
> >>> annoying for submitters because it can create a lot of conflicts.
> >>
> >> I'll leave it untouched.
> >>
> >>> Using a set-returning function is I think wrong here, betraying the
> >>> representation of the recovery status as stored in the system. We
> >>> know that there is only one state of recovery, fixed in shared memory.
> >>> Like the cousins of this new function, let's make thinks non-SRF,
> >>> returning one row all the time with PG_RETURN_NULL() if the conditions
> >>> for information display are not satisfied. When we are not in
> >>> recovery or when the role querying the function is not granted
> >>> ROLE_PG_READ_ALL_STATS, that will simplify the patch as there is no
> >>> need for the else branch with the nulls, as written in your patch.
> >>> The field values are acquired the right way, spinlock acquisitions
> >>> have to be short.
> >>
> >> My earlier thought for keeping pg_stat_get_recovery as an SRF is to
> >> make pg_stat_recovery simpler by avoiding an extra filter such as
> >> WHERE s.promote_triggered IS NOT NULL to preserve 0/1-row semantics.
> >> pg_stat_get_recovery_prefetch also uses the SRF pattern.
> >>
> >>> Like pg_stat_wal_receiver, let's add to the view definition a qual to
> >>> return a row only if the fields are not NULL.
> >>>
> >>> pg_get_wal_replay_pause_state() displays the pause_state, and it is
> >>> not hidden behind the stats read role. I am not really convinced that
> >>> this is worth bothering to treat as an exception in this patch. It
> >>> makes it a bit more complicated, for little gain. I would just hide
> >>> all the fields behind the role granted, to keep the code simpler, even
> >>> if that's slightly inconsistent with pg_get_wal_replay_pause_state().
> >>
> >> I agree that exposing a subset of columns unconditionally is not worth
> >> the added complexity.
> >>
> >>> After writing this last point, as far as I can see there is a small
> >>> optimization possible in the patch. When a role is not granted
> >>> ROLE_PG_READ_ALL_STATS, we know that we will not return any
> >>> information so we could skip the spinlock acquisition and avoid
> >>> spinlock acquisitions when one queries the function but has no access
> >>> to its data.
> >>
> >> Makes sense to me. I'll apply it as suggested.
> >>
> >>> + True if a promotion has been triggered for this standby server.
> >>>
> >>> Standby servers are implied if data is returned, this sentence can be
> >>> simpler and cut the "standby server" part.
> >>
> >> + 1
> >>
> >>> + Start LSN of the last WAL record replayed during recovery.
> >>> [...]
> >>> + End LSN of the last WAL record replayed during recovery.
> >>> [...]
> >>> + Timeline of the last replayed WAL record.
> >>> For other system views with LSN information, we don't use "LSN", but
> >>> "write-ahead log location". I'd suggest the same term here. These
> >>> three fields refer to the last record *successfully* replayed. It
> >>> seems important to mention this fact, I guess?
> >>
> >> I'll replace them.
> >>
> >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type>
> >>> + </para>
> >>> + <para>
> >>> + Current replay position. When replaying a record, this is the end
> >>> + position of the record being replayed; otherwise it equals
> >>> + <structfield>last_replayed_end_lsn</structfield>.
> >>>
> >>> Slightly inexact. This is the end LSN + 1.
> >>
> >> Yeh, this needs to be corrected.
> >>
> >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type>
> >>> [..]
> >>> + <structfield>replay_end_tli</structfield> <type>integer</type>
> >>>
> >>> Okay, I can fall behind the addition of these two, it could be helpful
> >>> in debugging something like a locking issue when replaying a specific
> >>> record, I guess. At least I'd want to know what is happening for a
> >>> record currently being replayed. It seems to me that this could be
> >>> more precise, mentioning that these refer to a record *currently*
> >>> being replayed.
> >>
> >> I will adjust the docs to say these describe the record currently
> >> being replayed, with replay_end_lsn being the end position + 1.
> >>
> >>> Is recoveryLastXTime actually that relevant to have? We use it for
> >>> logging and for recovery target times, which is something less
> >>> relevant than the other fields, especially if we think about standbys
> >>> where these have no targets to reach.
> >>
> >> I agree it is less central for standby monitoring (and partly overlaps
> >> with pg_last_xact_replay_timestamp()), so I’ll remove it from this
> >> view in the next revision.
> >>
> >>> currentChunkStartTime, on the contrary, is much more relevant to me,
> >>> due to the fact that we use it in WaitForWALToBecomeAvailable() with
> >>> active WAL receivers running.
> >>
> >> Yeah, it could be useful for apply-delay/catch-up diagnostics.
> >>
> >
> > Here is the updated patch set. Please take a look.
> >
> >
> > --
> > Best,
> > Xuneng
> > <v3-0001-Add-pg_stat_recovery-system-view.patch><v3-0002-Refactor-move-XLogSource-enum-to-xlogrecovery.h.patch><v3-0003-Add-wal_source-column-to-pg_stat_recovery.patch>
>
> 0001 has been pushed.
>
> 0002 is pure refactoring and only moves the structure XLogSource from .h to .c, thus no comment.
>
> A few comments on 0003.
>
> 1
> ```
> + /*
> + * Source from which the startup process most recently read WAL data.
> + * Updated when the startup process successfully reads WAL from a source.
> + * Note: this reflects the read source, not the original receipt source;
> + * streamed WAL read from local files will show XLOG_FROM_PG_WAL.
> + */
> + XLogSource lastReadSource;
> ```
>
> This comment says, "streamed WAL read from local files will show XLOG_FROM_PG_WAL”. But I don’t see how the conversion happens. In WaitForWALToBecomeAvailable(), there is a path XLogFileRead() is called with XLOG_FROM_STREAM, but in XLogFileRead(), source is directly assigned to XLogRecoveryCtl->lastReadSource without conversion.
>
> On the other side, if “stream” is impossible, then the doc should not mention it:
> ```
> + <para>
> + <literal>stream</literal>: WAL actively being streamed from the
> + upstream server.
> + </para>
> ```
you're right, the comment is wrong. There is no conversion. In
WaitForWALToBecomeAvailable(), when the walreceiver has flushed data
ahead of the startup process, the code deliberately opens the segment
file with XLOG_FROM_STREAM , and the existing comment explains the
intent:
* Use XLOG_FROM_STREAM so that source info is set correctly
* and XLogReceiptTime isn't changed.
XLogFileRead() then assigns that directly to lastReadSource without
conversion. So XLOG_FROM_STREAM is a valid observable value — it means
the segment was opened during active walreceiver streaming, even
though the actual I/O is against a local file in pg_wal. I think
stream documentation entry should stay.
> 2 Related to 1. WRT the new column name wal_source, it sounds like “where WAL is coming from”. Say the comment of lastReadSource is true, WAL from stream is shown as pg_wal, “wal source” sounds more like stream, and “wal read source” is pg_wal. From this perspective, I just feel the new column name should be something like “wal_read_source”.
>
agreed, wal_read_source seems to be a better name. It more closely
mirrors the underlying lastReadSource field and avoids ambiguity about
whether the column describes the delivery mechanism or the read path.
Please check the updated patches.
--
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-Refactor-move-XLogSource-enum-to-xlogrecovery.h.patch | application/x-patch | 2.4 KB |
| v4-0002-Add-wal_source-column-to-pg_stat_recovery.patch | application/x-patch | 8.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2026-03-06 15:52:38 | Re: Add pg_stat_recovery system view |
| Previous Message | Tom Lane | 2026-03-06 15:45:05 | Re: Add pg_stat_recovery system view |