Re: Show WAL write and fsync stats in pg_stat_io

From: Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "bharath(dot)rupireddyforpostgres(at)gmail(dot)com" <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Show WAL write and fsync stats in pg_stat_io
Date: 2024-01-10 12:59:24
Message-ID: CAN55FZ1c=p0Gwqa=hUqoKHStEtkMqMecdhC5YQf-RbStZVhPhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, 10 Jan 2024 at 08:25, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Jan 03, 2024 at 04:10:58PM +0300, Nazir Bilal Yavuz wrote:
> >
> > I thought removing op_bytes completely ( as you said "This patch
> > extends it with two more operation sizes, and there are even cases
> > where it may be a variable" ) from pg_stat_io view then adding
> > something like {read | write | extend}_bytes and {read | write |
> > extend}_calls could be better, so that we don't lose any information.
>
> But then you'd lose the possibility to analyze correlations between
> the size and the number of the operations, which is something that
> matters for more complex I/O scenarios. This does not need to be
> tackled in this patch, which is useful on its own, though I am really
> wondering if this is required for the recent work done by Thomas.
> Perhaps Andres, Thomas or Melanie could comment on that?

Yes, you are right.

> >> Yeah, a limitation like that may be acceptable for now. Tracking the
> >> WAL writer and WAL sender activities can be relevant in a lot of cases
> >> even if we don't have the full picture for the WAL receiver yet.
> >
> > I added that and disabled B_WAL_RECEIVER backend with comments
> > explaining why. v8 is attached.
>
> I can see that's what you have been adding here, which should be OK:
>
> > - if (track_io_timing)
> > + /*
> > + * B_WAL_RECEIVER backend does IOOBJECT_WAL IOObject & IOOP_READ IOOp IOs
> > + * but these IOs are not countable for now because IOOP_READ IOs' op_bytes
> > + * (number of bytes per unit of I/O) might not be the same all the time.
> > + * The current implementation requires that the op_bytes must be the same
> > + * for the same IOObject, IOContext and IOOp. To avoid confusion, the
> > + * B_WAL_RECEIVER backend & IOOBJECT_WAL IOObject IOs are disabled for
> > + * now.
> > + */
> > + if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL)
> > + return;
>
> This could be worded better, but that's one of these nits from me I
> usually tweak when committing stuff.

Thanks for doing that! Do you have any specific comments that can help
improve it?

> > +/*
> > + * Decide if IO timings need to be tracked. Timings associated to
> > + * IOOBJECT_WAL objects are tracked if track_wal_io_timing is enabled,
> > + * else rely on track_io_timing.
> > + */
> > +static bool
> > +pgstat_should_track_io_time(IOObject io_object)
> > +{
> > + if (io_object == IOOBJECT_WAL)
> > + return track_wal_io_timing;
> > +
> > + return track_io_timing;
> > +}
>
> One thing I was also considering is if eliminating this routine would
> make pgstat_count_io_op_time() more readable the result, but I cannot
> get to that.

I could not think of a way to eliminate pgstat_should_track_io_time()
route without causing performance regressions. What do you think about
moving inside of 'pgstat_should_track_io_time(io_object) if check' to
another function and call this function from
pgstat_count_io_op_time()? This does not change anything but IMO it
increases the readability.

> > if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
> > {
> > - pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > + if (io_object != IOOBJECT_WAL)
> > + pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> > if (io_object == IOOBJECT_RELATION)
> > INSTR_TIME_ADD(pgBufferUsage.shared_blk_write_time, io_time);
> > else if (io_object == IOOBJECT_TEMP_RELATION)
> > @@ -139,7 +177,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
> > }
> > else if (io_op == IOOP_READ)
> > {
> > - pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > + if (io_object != IOOBJECT_WAL)
> > + pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
> > +
> > if (io_object == IOOBJECT_RELATION)
> > INSTR_TIME_ADD(pgBufferUsage.shared_blk_read_time, io_time);
> > else if (io_object == IOOBJECT_TEMP_RELATION)
>
> A second thing is if this would be better with more switch/cases, say:
> switch (io_op):
> {
> case IOOP_EXTEND:
> case IOOP_WRITE:
> switch (io_object):
> {
> case WAL:
> /* do nothing */
> break;
> case RELATION:
> case TEMP:
> .. blah ..
> }
> break;
> case IOOP_READ:
> switch (io_object):
> {
> .. blah ..
> }
> break;
> }
>
> Or just this one to make it clear that nothing happens for WAL
> objects:
> switch (io_object):
> {
> case WAL:
> /* do nothing */
> break;
> case RELATION:
> switch (io_op):
> {
> case IOOP_EXTEND:
> case IOOP_WRITE:
> .. blah ..
> case IOOP_READ:
> .. blah ..
> }
> break;
> case TEMP:
> /* same switch as RELATION */
> break;
> }
>
> This duplicates a bit things, but at least in the second case it's
> clear which counters are updated when I/O timings are tracked. It's
> OK by me if people don't like this suggestion, but that would avoid
> bugs like the one I found upthread.

I am more inclined towards the second one because it is more likely
that a new io_object will be introduced rather than a new io_op. So, I
think the second one is a bit more future proof.

--
Regards,
Nazir Bilal Yavuz
Microsoft

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-01-10 13:03:28 Re: the s_lock_stuck on perform_spin_delay
Previous Message Aleksander Alekseev 2024-01-10 12:44:25 Re: System username in pg_stat_activity