Re: pg_stat_io not tracking smgrwriteback() is confusing

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: melanieplageman(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org, amit(dot)kapila16(at)gmail(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, jkatz(at)postgresql(dot)org
Subject: Re: pg_stat_io not tracking smgrwriteback() is confusing
Date: 2023-04-27 02:22:23
Message-ID: 20230427.112223.1799735597635814351.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote in
> On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I've yet to cook up a client backend test case (e.g. with COPY). I've taken
> > that as a todo.
>
> It was trivial to see client backend writebacks in almost any scenario
> once I set backend_flush_after. I wonder if it is worth mentioning the
> various "*flush_after" gucs in the docs?
>
> > I have a few outstanding questions:
> >
> > 1) Does it make sense for writebacks to count the number of blocks for
> > which writeback was requested or the number of calls to smgrwriteback() or
> > the number of actual syscalls made? We don't actually know from outside
> > of mdwriteback() how many FileWriteback() calls we will make.
>
> So, in the attached v3, I've kept the first method: writebacks are the
> number of blocks which the backend has requested writeback of. I'd like
> it to be clear in the docs exactly what writebacks are (so that people
> know not to add them together with writes or something like that). I
> made an effort but could use further docs review.

+ Number of units of size <varname>op_bytes</varname> which the backend
+ requested the kernel write out to permanent storage.

I just want to mention that it is not necessarily the same as the
number of system calls, but I'm not sure what others think about that.

+ Time spent in writeback operations in milliseconds (if
+ <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero). This
+ does not necessarily count the time spent by the kernel writing the
+ data out. The backend will initiate write-out of the dirty pages and
+ wait only if the request queue is full.

The last sentence looks like it's taken from the sync_file_range() man
page, but I think it's a bit too detailed. We could just say, "The
time usually only includes the time it takes to queue write-out
requests.", bit I'm not sure wh...

> > 2) I'm a little nervous about not including IOObject in the writeback
> > context. Technically, there is nothing stopping local buffer code from
> > calling IssuePendingWritebacks(). Right now, local buffer code doesn't
> > do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
> > hardcode in IOOBJECT_RELATION when there is nothing wrong with
> > requesting writeback of local buffers (AFAIK). What do you think?
>
> I've gone ahead and added IOObject to the WritebackContext.

The smgropen call in IssuePendingWritebacks below clearly shows that
the function only deals with shared buffers.

> /* and finally tell the kernel to write the data to storage */
> reln = smgropen(currlocator, InvalidBackendId);
> smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);

The callback-related code fully depends on callers following its
expectation. So we can rewrite the following comment added to
InitBufferPoll with a more confident tone.

+ * Initialize per-backend file flush context. IOObject is initialized to
+ * IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are the
+ * most likely targets for writeback. The backend can overwrite these as
+ * appropriate.

Or I actually think we might not even need to pass around the io_*
parameters and could just pass immediate values to the
pgstat_count_io_op_time call. If we ever start using shared buffers
for thing other than relation files (for example SLRU?), we'll have to
consider the target individually for each buffer block. That being
said, I'm fine with how it is either.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Wei Wang (Fujitsu) 2023-04-27 03:02:29 RE: Initial Schema Sync for Logical Replication
Previous Message Anton Kirilov 2023-04-26 22:56:49 Re: Add PQsendSyncMessage() to libpq