Re: Track IO times in pg_stat_io

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, smilingsamay(at)gmail(dot)com, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Track IO times in pg_stat_io
Date: 2023-03-16 21:19:16
Message-ID: CAAKRu_Z1x5FX6NA2Qa2S4w_=uzWTP-QcyomTOVUkgBYOVkxduw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

v5 attached mostly addresses instr_time persistence issues.

On Tue, Mar 14, 2023 at 6:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-03-09 11:50:38 -0500, Melanie Plageman wrote:
> > On Tue, Mar 7, 2023 at 1:39 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2023-03-06 11:30:13 -0500, Melanie Plageman wrote:
> > > > > As pgstat_bktype_io_stats_valid() is called only in Assert(), I think that would be a good idea
> > > > > to also check that if counts are not Zero then times are not Zero.
> > > >
> > > > Yes, I think adding some validation around the relationship between
> > > > counts and timing should help prevent developers from forgetting to call
> > > > pg_stat_count_io_op() when calling pgstat_count_io_time() (as relevant).
> > > >
> > > > However, I think that we cannot check that if IO counts are non-zero
> > > > that IO times are non-zero, because the user may not have
> > > > track_io_timing enabled. We can check that if IO times are not zero, IO
> > > > counts are not zero. I've done this in the attached v3.
> > >
> > > And even if track_io_timing is enabled, the timer granularity might be so low
> > > that we *still* get zeroes.
> > >
> > > I wonder if we should get rid of pgStatBlockReadTime, pgStatBlockWriteTime,
> >
> > And then have pg_stat_reset_shared('io') reset pg_stat_database IO
> > stats?
>
> Yes.

I think this makes sense but I am hesitant to do it in this patchset,
because it feels a bit hidden...maybe?

But, if you feel strongly, I will make the change.

> > > > typedef struct PgStat_BktypeIO
> > > > {
> > > > - PgStat_Counter data[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > + PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > + instr_time times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
> > > > } PgStat_BktypeIO;
> > >
> > > Ah, you're going to hate me. We can't store instr_time on disk. There's
> > > another patch that gets substantial peformance gains by varying the frequency
> > > at which instr_time keeps track of time based on the CPU frequency...
> >
> > What does that have to do with what we can store on disk?
>
> The frequency can change.

Ah, I see.

> > If so, would it not be enough to do this when reading/writing the stats
> > file?
>
> Theoretically yes. But to me it seems cleaner to do it when flushing to shared
> stats. See also the overflow issue below.
>
> > > It also just doesn't have enough range to keep track of system wide
> > > time on a larger system. A single backend won't run for 293 years, but
> > > with a few thousand backends that's a whole different story.
> > >
> > > I think we need to accumulate in instr_time, but convert to floating point
> > > when flushing stats.
> >
> > Hmmm. So, are you saying that we need to read from disk when we query
> > the view and add that to what is in shared memory? That we only store
> > the delta since the last restart in the instr_time array?
>
> No, I don't think I am suggesting that. What I am trying to suggest is that
> PendingIOStats should contain instr_time, but that PgStat_IO should contain
> PgStat_Counter as microseconds, as before.

So, I've modified the code to make a union of instr_time and
PgStat_Counter in PgStat_BktypeIO. I am not quite sure if this is okay.
I store in microsec and then in pg_stat_io, I multiply to get
milliseconds for display.

I considered refactoring pgstat_io_end() to use INSTR_TIME_ACCUM_DIFF()
like [1], but, in the end I actually think I would end up with more
operations because of the various different counters needing to be
updated. As it is now, I do a single subtract and a few adds (one for
each of the different statistics objects tracking IO times
(pgBufferUsage, pgStatBlockWrite/ReadTime). Whereas, I would need to do
an accum diff for every one of those.

- Melanie

[1] https://www.postgresql.org/message-id/1feedb83-7aa9-cb4b-5086-598349d3f555%40gmail.com

Attachment Content-Type Size
v5-0001-Track-IO-times-in-pg_stat_io.patch text/x-patch 26.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-03-16 21:20:10 Re: Add LZ4 compression in pg_dump
Previous Message Justin Pryzby 2023-03-16 21:11:59 Re: doc: mentioned CREATE+ATTACH PARTITION as an alternative to CREATE TABLE..PARTITION OF