Re: pg_stat_io not tracking smgrwriteback() is confusing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, amit(dot)kapila16(at)gmail(dot)com, alvherre(at)alvh(dot)no-ip(dot)org, jkatz(at)postgresql(dot)org
Subject: Re: pg_stat_io not tracking smgrwriteback() is confusing
Date: 2023-05-04 16:44:02
Message-ID: 20230504164402.5ek3qr3mdaglou4m@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-04-27 11:36:49 -0400, Melanie Plageman wrote:
> > > /* and finally tell the kernel to write the data to storage */
> > > reln = smgropen(currlocator, InvalidBackendId);
> > > smgrwriteback(reln, BufTagGetForkNum(&tag), tag.blockNum, nblocks);
>
> Yes, as it is currently, IssuePendingWritebacks() is only used for shared
> buffers. My rationale for including IOObject is that localbuf.c calls
> smgr* functions and there isn't anything stopping it from calling
> smgrwriteback() or using WritebackContexts (AFAICT).

I think it's extremely unlikely that we'll ever do that, because it's very
common to have temp tables that are bigger than temp_buffers. We basically
hope that the kernel can do good caching for us there.

> > Or I actually think we might not even need to pass around the io_*
> > parameters and could just pass immediate values to the
> > pgstat_count_io_op_time call. If we ever start using shared buffers
> > for thing other than relation files (for example SLRU?), we'll have to
> > consider the target individually for each buffer block. That being
> > said, I'm fine with how it is either.
>
> In IssuePendingWritebacks() we don't actually know which IOContext we
> are issuing writebacks for when we call pgstat_count_io_op_time() (we do
> issue pending writebacks for other IOContexts than IOCONTEXT_NORMAL). I
> agree IOObject is not strictly necessary right now. I've kept IOObject a
> member of WritebackContext for the reasons I mention above, however, I
> am open to removing it if it adds confusion.

I don't think it's really worth adding struct members for potential future
safety. We can just add them later if we end up needing them.

> From 7cdd6fc78ed82180a705ab9667714f80d08c5f7d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> Date: Mon, 24 Apr 2023 18:21:54 -0400
> Subject: [PATCH v4] Add writeback to pg_stat_io
>
> 28e626bde00 added the notion of IOOps but neglected to include
> writeback. With the addition of IO timing to pg_stat_io in ac8d53dae5,
> the omission of writeback caused some confusion. Checkpointer write
> timing in pg_stat_io often differed greatly from the write timing
> written to the log. To fix this, add IOOp IOOP_WRITEBACK and track
> writebacks and writeback timing in pg_stat_io.

For the future: It'd be good to note that catversion needs to be increased.

Off-topic: I've been wondering about computing a "catversion hash" based on
all the files going into initdb. At least to have some tooling to detect
missing catversion increases...

> index 99f7f95c39..27b6f1a0a0 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3867,6 +3867,32 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
> </entry>
> </row>
>
> + <row>
> + <entry role="catalog_table_entry">
> + <para role="column_definition">
> + <structfield>writebacks</structfield> <type>bigint</type>
> + </para>
> + <para>
> + Number of units of size <varname>op_bytes</varname> which the backend
> + requested the kernel write out to permanent storage.
> + </para>
> + </entry>
> + </row>

I think the reference to "backend" here is somewhat misplaced - it could be
checkpointer or bgwriter as well. We don't reference the backend in other
comparable columns of pgsio either...

> diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
> index 0057443f0c..a7182fe95a 100644
> --- a/src/backend/storage/buffer/buf_init.c
> +++ b/src/backend/storage/buffer/buf_init.c
> @@ -145,9 +145,15 @@ InitBufferPool(void)
> /* Init other shared buffer-management stuff */
> StrategyInitialize(!foundDescs);
>
> - /* Initialize per-backend file flush context */
> - WritebackContextInit(&BackendWritebackContext,
> - &backend_flush_after);
> + /*
> + * Initialize per-backend file flush context. IOContext is initialized to
> + * IOCONTEXT_NORMAL because this is the most common context. IOObject is
> + * initialized to IOOBJECT_RELATION because writeback is currently only
> + * requested for permanent relations in shared buffers. The backend can
> + * overwrite these as appropriate.
> + */
> + WritebackContextInit(&BackendWritebackContext, IOOBJECT_RELATION,
> + IOCONTEXT_NORMAL, &backend_flush_after);
> }
>

This seems somewhat icky.

> /*
> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 1fa689052e..116910cdfe 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -1685,6 +1685,8 @@ again:
> FlushBuffer(buf_hdr, NULL, IOOBJECT_RELATION, io_context);
> LWLockRelease(content_lock);
>
> + BackendWritebackContext.io_object = IOOBJECT_RELATION;
> + BackendWritebackContext.io_context = io_context;
> ScheduleBufferTagForWriteback(&BackendWritebackContext,
> &buf_hdr->tag);
> }

What about passing the io_context to ScheduleBufferTagForWriteback instead?

> --- a/src/test/regress/sql/stats.sql
> +++ b/src/test/regress/sql/stats.sql

Hm. Could we add a test for this? While it's not implemented everywhere, we
still issue the smgrwriteback() afaics. The default for the _flush_after GUCs
changes, but backend_flush_after is USERSET, so we could just change it for a
single command.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-05-04 16:46:54 Re: pg_stat_io not tracking smgrwriteback() is confusing
Previous Message Amit Langote 2023-05-04 13:55:39 Re: "variable not found in subplan target list"