Re: Track IO times in pg_stat_io

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, smilingsamay(at)gmail(dot)com, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Track IO times in pg_stat_io
Date: 2023-04-07 16:17:38
Message-ID: CAAKRu_bhz9DaO6jS3_WHnGC_ztSSq=LAq+GsFRJVS_pa8aFkgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached v9 addresses review feedback as well as resolving merge
conflicts with recent relation extension patchset.

I've changed pgstat_count_io_op_time() to take a count and call
pgstat_count_io_op_n() so it can be used with smgrzeroextend(). I do
wish that the parameter to pgstat_count_io_op_n() was called "count" and
not "cnt"...

I've also reordered the call site of pgstat_count_io_op_time() in a few
locations, but I have some questions about this.

Before, I didn't think it mattered much that we didn't finish counting
IO time until after setting BM_VALID or BM_DIRTY and unsetting
BM_IO_IN_PROGRESS. With the relation extension code doing this for many
buffers at once, though, I wondered if this will make the IO timing too
inaccurate.

As such, I've moved pgstat_count_io_op_time() to before we set those
flags in all locations. I did wonder if it is bad to prolong having the
buffer pinned and not having those flags set, though.

On Tue, Apr 4, 2023 at 8:59 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2023-03-31 15:44:58 -0400, Melanie Plageman wrote:
> > From 789d4bf1fb749a26523dbcd2c69795916b711c68 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Tue, 21 Mar 2023 16:00:55 -0400
> > Subject: [PATCH v8 1/4] Count IO time for temp relation writes
> >
> > Both pgstat_database and pgBufferUsage write times failed to count
> > timing for flushes of dirty local buffers when acquiring a new local
> > buffer for a temporary relation block.
>
> I think it'd be worth mentioning here that we do count read time? Otherwise
> it'd not be as clear that adding tracking increases consistency...

Done

> > From f4e0db5c833f33b30d4c0b4bebec1096a1745d81 Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Tue, 21 Mar 2023 18:20:44 -0400
> > Subject: [PATCH v8 2/4] FlushRelationBuffers() counts temp relation IO timing
> >
> > Add pgstat_database and pgBufferUsage IO timing counting to
> > FlushRelationBuffers() for writes of temporary relations.
> > ---
> > src/backend/storage/buffer/bufmgr.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> > index b3adbbe7d2..05e98d5994 100644
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -3571,6 +3571,8 @@ FlushRelationBuffers(Relation rel)
> > {
> > int i;
> > BufferDesc *bufHdr;
> > + instr_time io_start,
> > + io_time;
> >
> > if (RelationUsesLocalBuffers(rel))
> > {
> > @@ -3596,17 +3598,33 @@ FlushRelationBuffers(Relation rel)
> >
> > PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
> >
> > + if (track_io_timing)
> > + INSTR_TIME_SET_CURRENT(io_start);
> > + else
> > + INSTR_TIME_SET_ZERO(io_start);
> > +
> > smgrwrite(RelationGetSmgr(rel),
> > BufTagGetForkNum(&bufHdr->tag),
> > bufHdr->tag.blockNum,
> > localpage,
> > false);
> >
> > +
>
> Spurious newline.

Fixed.

> > From 2bdad725133395ded199ecc726096e052d6e654b Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Fri, 31 Mar 2023 15:32:36 -0400
> > Subject: [PATCH v8 3/4] Track IO times in pg_stat_io
> >
> > Add IO timing for reads, writes, extends, and fsyncs to pg_stat_io.
> >
> > Reviewed-by: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
> > Reviewed-by: Andres Freund <andres(at)anarazel(dot)de>
> > Discussion: https://www.postgresql.org/message-id/flat/CAAKRu_ay5iKmnbXZ3DsauViF3eMxu4m1oNnJXqV_HyqYeg55Ww%40mail.gmail.com
> > ---
>
> > -static PgStat_BktypeIO PendingIOStats;
> > +typedef struct PgStat_PendingIO
> > +{
> > + PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > + instr_time pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > +} PgStat_PendingIO;
>
> Probably will look less awful after adding the typedef to typedefs.list.

Done.
One day I will remember to add things to typedefs.list.

> > + /* we do track it */
> > + if (pgstat_tracks_io_op(bktype, io_object, io_context, io_op))
> > + {
> > + /* ensure that if IO times are non-zero, counts are > 0 */
> > + if (backend_io->times[io_object][io_context][io_op] != 0 &&
> > + backend_io->counts[io_object][io_context][io_op] <= 0)
> > + return false;
> > +
> > continue;
> > + }
> >
> > - /* There are stats and there shouldn't be */
> > - if (!bktype_tracked ||
> > - !pgstat_tracks_io_op(bktype, io_object, io_context, io_op))
> > + /* we don't track it, and it is not 0 */
> > + if (backend_io->counts[io_object][io_context][io_op] != 0)
> > return false;
> > +
> > + /* we don't track this IOOp, so make sure its IO time is zero */
> > + if (pgstat_tracks_io_time(io_op) > -1)
> > + {
> > + if (backend_io->times[io_object][io_context][io_op] != 0)
> > + return false;
> > + }
>
> I'm somehow doubtful it's worth having pgstat_tracks_io_time, what kind of
> error would be caught by this check?

Yea, now that the function to count IO timing also increments the count,
I don't think this can happen.

However, pgstat_tracks_io_time() is useful in its other call site in
pgstatfuncs which lets us continue in the loop if we don't need to fill
in that IO time. Perhaps it could be replaced with a if (io_op ==
IOOP_EVICT || io_op == IOOP_REUSE ... but I kind of like the function?
But, maybe it is overkill...

For now, I've moved pgstat_tracks_io_time() into pgstatfuncs.c as a
helper.

> > +/*
> > + * Get the number of the column containing IO times for the specified IOOp. If
> > + * the specified IOOp is one for which IO time is not tracked, return -1. Note
> > + * that this function assumes that IO time for an IOOp is displayed in the view
> > + * in the column directly after the IOOp counts.
> > + */
> > +static io_stat_col
> > +pgstat_get_io_time_index(IOOp io_op)
> > +{
> > + if (pgstat_tracks_io_time(io_op) == -1)
> > + return -1;
>
> That seems dangerous - won't it just lead to accessing something from before
> the start of the array? Probably should just assert.

Yea. I've removed it entirely as the passed in io_op can't be negative
(unless we change the enum values) and we add one to it before
returning.

> > @@ -1363,20 +1389,32 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
> >
> > for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> > {
> > - int col_idx = pgstat_get_io_op_index(io_op);
> > + PgStat_Counter count = bktype_stats->counts[io_obj][io_context][io_op];
> > + int i = pgstat_get_io_op_index(io_op);
> >
> > /*
> > * Some combinations of BackendType and IOOp, of IOContext
> > * and IOOp, and of IOObject and IOOp are not tracked. Set
> > * these cells in the view NULL.
> > */
> > - nulls[col_idx] = !pgstat_tracks_io_op(bktype, io_obj, io_context, io_op);
> > + if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
> > + values[i] = Int64GetDatum(count);
> > + else
> > + nulls[i] = true;
> > + }
> >
> > - if (nulls[col_idx])
> > + for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
> > + {
> > + PgStat_Counter time = bktype_stats->times[io_obj][io_context][io_op];
> > + int i = pgstat_get_io_time_index(io_op);
> > +
> > + if (i == -1)
> > continue;
> >
> > - values[col_idx] =
> > - Int64GetDatum(bktype_stats->data[io_obj][io_context][io_op]);
> > + if (!nulls[pgstat_get_io_op_index(io_op)])
> > + values[i] = Float8GetDatum(pg_stat_micro_to_millisecs(time));
> > + else
> > + nulls[i] = true;
> > }
>
> Why two loops?

Well, it was a stylistic choice that I now realize is actually
confusing.
I consolidated them.

- Melanie

Attachment Content-Type Size
v9-0004-pgstat_database-uses-pgstat_io-time-counters.patch text/x-patch 2.7 KB
v9-0001-Count-IO-time-for-temp-relation-writes.patch text/x-patch 1.9 KB
v9-0003-Track-IO-times-in-pg_stat_io.patch text/x-patch 26.3 KB
v9-0002-FlushRelationBuffers-counts-temp-relation-IO-writ.patch text/x-patch 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-07 17:32:22 Re: Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c
Previous Message Daniel Gustafsson 2023-04-07 16:14:37 Re: Making background psql nicer to use in tap tests