Re: Track IO times in pg_stat_io

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, smilingsamay(at)gmail(dot)com, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Track IO times in pg_stat_io
Date: 2023-04-08 00:09:40
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2023-04-07 12:17:38 -0400, Melanie Plageman wrote:
> Attached v9 addresses review feedback as well as resolving merge
> conflicts with recent relation extension patchset.

I've edited it a bit more:

- removed pgstat_tracks_io_time() and replaced it by returning the new
IO_COL_INVALID = -1 from pgstat_get_io_time_index() when there's no time

- moved PgStat_Counter count, time into the respective branches. It feels
somewhat wrong to access the time when we then decide there is no time.

- s/io_object/io_obj/ in pgstat_count_io_op_time(), combined with added
linebreaks, got the code to under 80 chars

- renamed pg_stat_microseconds_to_milliseconds to pg_stat_us_to_ms

- removed a spurious newline

- the times reported by pg_stat_io had their fractional part removed, due to
pg_stat_us_to_ms returning an integer

Verifying this, I saw that the write time visible in pg_stat_io didn't quite
match what I saw in log_checkpoints. But not always. Eventually I figured out
that that's not pg_stat_io's fault - log_checkpoint's write includes a lot of
things, including several other CheckPoint* routines, flushing WAL, asking the
kernel to flush things to disk... The biggest portion in my case were the
smgrwriteback() calls - which pg_stat_io doesn't track - oops.

Pushed up to and including 0003.

> I've changed pgstat_count_io_op_time() to take a count and call
> pgstat_count_io_op_n() so it can be used with smgrzeroextend(). I do
> wish that the parameter to pgstat_count_io_op_n() was called "count" and
> not "cnt"...


> I've also reordered the call site of pgstat_count_io_op_time() in a few
> locations, but I have some questions about this.
> Before, I didn't think it mattered much that we didn't finish counting
> IO time until after setting BM_VALID or BM_DIRTY and unsetting
> BM_IO_IN_PROGRESS. With the relation extension code doing this for many
> buffers at once, though, I wondered if this will make the IO timing too
> inaccurate.

> As such, I've moved pgstat_count_io_op_time() to before we set those
> flags in all locations. I did wonder if it is bad to prolong having the
> buffer pinned and not having those flags set, though.

I went back and forth about this before. I think it's ok the way you did it.

I think 0004 needs a bit more work. At the very least we would have to swap
the order of pgstat_flush_pending_entries() and pgstat_flush_io() - entirely
doable. Unlike 0003, this doesn't make pg_stat_io more complete, or such, so
I'm inclined to leave it for 17. I think there might be some more
opportunities for having counts "flow down", like the patch does.


Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-08 00:14:33 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Previous Message Michael Paquier 2023-04-08 00:05:10 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?