Re: Show WAL write and fsync stats in pg_stat_io

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Show WAL write and fsync stats in pg_stat_io
Date: 2023-07-21 22:30:06
Message-ID: CAAKRu_ZzY9k5XgvsSwU2sOvb==POv=jaKTom9uRVfjAgWhTvgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 28, 2023 at 6:09 AM Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com> wrote:
> This is a WIP patch to add WAL write and fsync stats to pg_stat_io
> view.

Thanks for working on this! I have some feedback on the content of the
patch as well as some items that I feel are missing.

I think it would be good to count WAL reads even though they are not
currently represented in pg_stat_wal. Here is a thread discussing this
[1].

Eventually, the docs will need an update as well. You can wait until a
later version of the patch to do this, but I would include it in a list
of the remaining TODOs in your next version.

I think we will also want to add an IOContext for WAL initialization.
Then we can track how long is spent doing WAL init (including filling
the WAL file with zeroes). XLogFileInitInternal() is likely where we
would want to add it. And op_bytes for this would likely be
wal_segment_size. I thought I heard about someone proposing adding WAL
init to pg_stat_wal, but I can't find the thread.

I think there is also an argument for counting WAL files recycled as
IOOP_REUSES. We should start thinking about how to interpret the
different IOOps within the two IOContexts and discussing what would be
useful to count. For example, should removing a logfile count as an
IOOP_EVICT? Maybe it is not directly related to "IO" enough or even an
interesting statistic, but we should think about what kinds of
IO-related WAL statistics we want to track.

Any that we decide not to count for now should be "banned" in
pgstat_tracks_io_op() for clarity. For example, if we create a separate
IOContext for WAL file init, I'm not sure what would count as an
IOOP_EXTEND in IOCONTEXT_NORMAL for IOOBJECT_WAL.

Also, I think there are some backend types which will not generate WAL
and we should determine which those are and skip those rows in
pgstat_tracks_io_object().

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 8b0710abe6..2ee6c21398 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2207,6 +2207,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID
tli, bool flexible)

I think we should likely follow the pattern of using
pgstat_prepare_io_time() and pgstat_count_io_op_time() as it is done
elsewhere. You could pass the IOObject as a parameter to
pgstat_prepare_io_time() in order to determine if we should check
track_io_timing or track_wal_io_timing. And we'll want to check
track_wal_io_timing if IOObject is IOOBJECT_WAL in
pgstat_count_io_op_time().

INSTR_TIME_SET_CURRENT(duration);

INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration,
start);

+ pgstat_count_io_op_time(IOOBJECT_WAL,
IOCONTEXT_NORMAL, IOOP_WRITE, start, 1);
+ } else
+ {

Other users of pgstat_count_io_op_time()/io_op_n() which write multiple
pages at a time pass the number of pages in as the cnt parameter. (see
ExtendBufferedRelLocal() as an example). I think we want to do that for
WAL also. In this case, it would be the local variable "npages" and we
can do it outside of this loop.

It is true that the existing WAL stats count wal_writes here. However,
this is essentially counting write system calls, which is probably not
what we want for pg_stat_io. See [2] for a discussion about whether to
count blocks written back or writeback system calls for a previous
pg_stat_io feature. All of the other block-based IO statistics in
pg_stat_io count the number of blocks.

This being said, we probably want to just leave
PendingWalStats.wal_write++ here. We would normally move it into
pg_stat_io like we have with pgBufferUsage and the db IO stats that are
updated in pgstat_count_io_op_time(). This consolidation makes it easier
to eventually reduce the duplication. However, in this case, it seems
wal_write counts something we don't count in pg_stat_io, so it can
probably be left here. I would still move the
PendingWalStats.wal_write_time into pgstat_count_io_op_time(), since
that seems like it is the same as what will be in pg_stat_io.

Also, op_bytes for IOOBJECT_WAL/IOCONTEXT_NORMAL should be XLOG_BLCKSZ
(see comment in pg_stat_get_io() in pgstatfuncs.c). Those default to the
same value but can be made to be different.

+ pgstat_count_io_op_n(IOOBJECT_WAL,
IOCONTEXT_NORMAL, IOOP_WRITE, 1);
}

PendingWalStats.wal_write++;

@@ -8233,6 +8237,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)

INSTR_TIME_SET_CURRENT(duration);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
+ pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
IOOP_FSYNC, start, 1);

I would wrap this line and check other lines to make sure they are not
too long.

+ } else
+ {
+ pgstat_count_io_op_n(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, 1);
}

PendingWalStats.wal_sync++;

Same feedback as above about using the prepare/count pattern used for
pg_stat_io elsewhere. In this case, you should be able to move
PendingWalStats.wal_sync into there as well.

diff --git a/src/backend/utils/activity/pgstat_io.c
b/src/backend/utils/activity/pgstat_io.c
@@ -350,6 +352,11 @@ pgstat_tracks_io_object(BackendType bktype,
IOObject io_object,
if (!pgstat_tracks_io_bktype(bktype))
return false;

+
+ if (io_context != IOCONTEXT_NORMAL &&
+ io_object == IOOBJECT_WAL)
+ return false;

We should add more restrictions. See the top of my email for details.

> There is a track_io_timing variable to control pg_stat_io
> timings and a track_wal_io_timing variable to control WAL timings. I
> couldn't decide on which logic to enable WAL timings on pg_stat_io.
> For now, both pg_stat_io and track_wal_io_timing are needed to be
> enabled to track WAL timings in pg_stat_io.

Hmm. I could see a case where someone doesn't want to incur the
overhead of track_io_timing for regular IO but does want to do so for
WAL because they are interested in a specific issue. I'm not sure
though. I could be convinced otherwise (based on relative overhead,
etc).

> Also, if you compare WAL stats in pg_stat_wal and pg_stat_io; you can
> come across differences. These differences are caused by the
> background writer's WAL stats not being flushed. Because of that,
> background writer's WAL stats are not seen in pg_stat_wal but in
> pg_stat_io. I already sent a patch [1] to fix that.

Cool! Thanks for doing that.

- Melanie

[1] https://www.postgresql.org/message-id/flat/20230216191138.jotc73lqb7xhfqbi%40awork3.anarazel.de#eb4a641427fa1eb013e9ecdd8648e640
[2] https://www.postgresql.org/message-id/20230504165738.4e2hfoddoels542c%40awork3.anarazel.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-07-21 22:32:43 Re: Fix search_path for all maintenance commands
Previous Message Peter Geoghegan 2023-07-21 19:17:50 Re: Use of additional index columns in rows filtering