Re: pg_stat_io not tracking smgrwriteback() is confusing

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org, amit(dot)kapila16(at)gmail(dot)com, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: pg_stat_io not tracking smgrwriteback() is confusing
Date: 2023-05-03 15:36:10
Message-ID: 86559a5e-ea15-9d0d-bf08-d08cd36e8e93@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/27/23 11:36 AM, Melanie Plageman wrote:
> Thanks for the review!
>
> On Wed, Apr 26, 2023 at 10:22 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>>
>> At Wed, 26 Apr 2023 17:08:14 -0400, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote in
>>> On Mon, Apr 24, 2023 at 9:29 PM Melanie Plageman
>>> <melanieplageman(at)gmail(dot)com> wrote:
>>>> I've yet to cook up a client backend test case (e.g. with COPY). I've taken
>>>> that as a todo.
>>>
>>> It was trivial to see client backend writebacks in almost any scenario
>>> once I set backend_flush_after. I wonder if it is worth mentioning the
>>> various "*flush_after" gucs in the docs?
>>>
>>>> I have a few outstanding questions:
>>>>
>>>> 1) Does it make sense for writebacks to count the number of blocks for
>>>> which writeback was requested or the number of calls to smgrwriteback() or
>>>> the number of actual syscalls made? We don't actually know from outside
>>>> of mdwriteback() how many FileWriteback() calls we will make.
>>>
>>> So, in the attached v3, I've kept the first method: writebacks are the
>>> number of blocks which the backend has requested writeback of. I'd like
>>> it to be clear in the docs exactly what writebacks are (so that people
>>> know not to add them together with writes or something like that). I
>>> made an effort but could use further docs review.
>>
>> + Number of units of size <varname>op_bytes</varname> which the backend
>> + requested the kernel write out to permanent storage.
>>
>> I just want to mention that it is not necessarily the same as the
>> number of system calls, but I'm not sure what others think about that.
>
> My thinking is that some other IO operations, for example, extends,
> count the number of blocks extended and not the number of syscalls.
>
>> + Time spent in writeback operations in milliseconds (if
>> + <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero). This
>> + does not necessarily count the time spent by the kernel writing the
>> + data out. The backend will initiate write-out of the dirty pages and
>> + wait only if the request queue is full.
>>
>> The last sentence looks like it's taken from the sync_file_range() man
>> page, but I think it's a bit too detailed. We could just say, "The
>> time usually only includes the time it takes to queue write-out
>> requests.", bit I'm not sure wh...
>
> Ah, yes, I indeed took heavy inspiration from the sync_file_range()
> man page :) I've modified this comment in the attached v4. I didn't want
> to say "usually" since I imagine it is quite workload and configuration
> dependent.
>
>>>> 2) I'm a little nervous about not including IOObject in the writeback
>>>> context. Technically, there is nothing stopping local buffer code from
>>>> calling IssuePendingWritebacks(). Right now, local buffer code doesn't
>>>> do ScheduleBufferTagForWriteback(). But it doesn't seem quite right to
>>>> hardcode in IOOBJECT_RELATION when there is nothing wrong with
>>>> requesting writeback of local buffers (AFAIK). What do you think?
>>>
>>> I've gone ahead and added IOObject to the WritebackContext.
>>
>> The smgropen call in IssuePendingWritebacks below clearly shows that
>> the function only deals with shared buffers.
>>
>>> /* and finally tell the kernel to write the data to storage */
>>> reln = smgropen(currlocator, InvalidBackendId);
>>> smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);
>
> Yes, as it is currently, IssuePendingWritebacks() is only used for shared
> buffers. My rationale for including IOObject is that localbuf.c calls
> smgr* functions and there isn't anything stopping it from calling
> smgrwriteback() or using WritebackContexts (AFAICT).
>
>> The callback-related code fully depends on callers following its
>> expectation. So we can rewrite the following comment added to
>> InitBufferPoll with a more confident tone.
>>
>> + * Initialize per-backend file flush context. IOObject is initialized to
>> + * IOOBJECT_RELATION and IOContext to IOCONTEXT_NORMAL since these are the
>> + * most likely targets for writeback. The backend can overwrite these as
>> + * appropriate.
>
> I have updated this comment to be more confident and specific.
>
>> Or I actually think we might not even need to pass around the io_*
>> parameters and could just pass immediate values to the
>> pgstat_count_io_op_time call. If we ever start using shared buffers
>> for thing other than relation files (for example SLRU?), we'll have to
>> consider the target individually for each buffer block. That being
>> said, I'm fine with how it is either.
>
> In IssuePendingWritebacks() we don't actually know which IOContext we
> are issuing writebacks for when we call pgstat_count_io_op_time() (we do
> issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I
> agree IOObject is not strictly necessary right now. I've kept IOObject a
> member of WritebackContext for the reasons I mention above, however, I
> am open to removing it if it adds confusion.

[RMT hat]

Horiguchi-san: do the changes that Melanie made address your feedback?

It'd be good if we can get this into Beta 1 if everyone is comfortable
with the patch.

Thanks,

Jonathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2023-05-03 16:00:05 Re: psql: Add role's membership options to the \du+ command
Previous Message Jonathan S. Katz 2023-05-03 15:29:14 Re: Move defaults toward ICU in 16?