Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Li Japin <japinli(at)hotmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-03-04 23:38:20
Message-ID: da0059e5e7282573c52ecffd3554a810@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-03-05 01:02, Fujii Masao wrote:
> On 2021/03/04 16:14, Masahiro Ikeda wrote:
>> On 2021-03-03 20:27, Masahiro Ikeda wrote:
>>> On 2021-03-03 16:30, Fujii Masao wrote:
>>>> On 2021/03/03 14:33, Masahiro Ikeda wrote:
>>>>> On 2021-02-24 16:14, Fujii Masao wrote:
>>>>>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>>>>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>>>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>>>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>>
>>>>>>>>> I pgindented the patches.
>>>>>>>>
>>>>>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>>>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>>>>>> incremented by the WAL receiver during replication.
>>>>>>>>
>>>>>>>> ("which normally called" should be "which is normally called" or
>>>>>>>> "which normally is called" if you want to keep true to the
>>>>>>>> original)
>>>>>>>> You missed the adding the space before an opening parenthesis
>>>>>>>> here and
>>>>>>>> elsewhere (probably copy-paste)
>>>>>>>>
>>>>>>>> is ether -> is either
>>>>>>>> "This parameter is off by default as it will repeatedly query
>>>>>>>> the
>>>>>>>> operating system..."
>>>>>>>> ", because" -> "as"
>>>>>>>
>>>>>>> Thanks, I fixed them.
>>>>>>>
>>>>>>>> wal_write_time and the sync items also need the note: "This is
>>>>>>>> also
>>>>>>>> incremented by the WAL receiver during replication."
>>>>>>>
>>>>>>> I skipped changing it since I separated the stats for the WAL
>>>>>>> receiver
>>>>>>> in pg_stat_wal_receiver.
>>>>>>>
>>>>>>>> "The number of times it happened..." -> " (the tally of this
>>>>>>>> event is
>>>>>>>> reported in wal_buffers_full in....) This is undesirable because
>>>>>>>> ..."
>>>>>>>
>>>>>>> Thanks, I fixed it.
>>>>>>>
>>>>>>>> I notice that the patch for WAL receiver doesn't require
>>>>>>>> explicitly
>>>>>>>> computing the sync statistics but does require computing the
>>>>>>>> write
>>>>>>>> statistics.  This is because of the presence of issue_xlog_fsync
>>>>>>>> but
>>>>>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I
>>>>>>>> observe that
>>>>>>>> the XLogWrite code path calls pgstat_report_wait_*() while the
>>>>>>>> WAL
>>>>>>>> receiver path does not.  It seems technically straight-forward
>>>>>>>> to
>>>>>>>> refactor here to avoid the almost-duplicated logic in the two
>>>>>>>> places,
>>>>>>>> though I suspect there may be a trade-off for not adding another
>>>>>>>> function call to the stack given the importance of WAL
>>>>>>>> processing
>>>>>>>> (though that seems marginalized compared to the cost of actually
>>>>>>>> writing the WAL).  Or, as Fujii noted, go the other way and
>>>>>>>> don't have
>>>>>>>> any shared code between the two but instead implement the WAL
>>>>>>>> receiver
>>>>>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>>>>>> half-and-half implementation seems undesirable.
>>>>>>>
>>>>>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>>>>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>>>>
>>>>>> Thanks for updating the patches!
>>>>>>
>>>>>>
>>>>>>> I added the infrastructure code to communicate the WAL receiver
>>>>>>> stats messages between the WAL receiver and the stats collector,
>>>>>>> and
>>>>>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>>>>>> What do you think?
>>>>>>
>>>>>> On second thought, this idea seems not good. Because those stats
>>>>>> are
>>>>>> collected between multiple walreceivers, but other values in
>>>>>> pg_stat_wal_receiver is only related to the walreceiver process
>>>>>> running
>>>>>> at that moment. IOW, it seems strange that some values show
>>>>>> dynamic
>>>>>> stats and the others show collected stats, even though they are in
>>>>>> the same view pg_stat_wal_receiver. Thought?
>>>>>
>>>>> OK, I fixed it.
>>>>> The stats collected in the WAL receiver is exposed in pg_stat_wal
>>>>> view in v11 patch.
>>>>
>>>> Thanks for updating the patches! I'm now reading 001 patch.
>>>>
>>>> +    /* Check whether the WAL file was synced to disk right now */
>>>> +    if (enableFsync &&
>>>> +        (sync_method == SYNC_METHOD_FSYNC ||
>>>> +         sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
>>>> +         sync_method == SYNC_METHOD_FDATASYNC))
>>>> +    {
>>>>
>>>> Isn't it better to make issue_xlog_fsync() return immediately
>>>> if enableFsync is off, sync_method is open_sync or open_data_sync,
>>>> to simplify the code more?
>>>
>>> Thanks for the comments.
>>> I added the above code in v12 patch.
>>>
>>>>
>>>> +        /*
>>>> +         * Send WAL statistics only if WalWriterDelay has elapsed
>>>> to minimize
>>>> +         * the overhead in WAL-writing.
>>>> +         */
>>>> +        if (rc & WL_TIMEOUT)
>>>> +            pgstat_send_wal();
>>>>
>>>> On second thought, this change means that it always takes
>>>> wal_writer_delay
>>>> before walwriter's WAL stats is sent after XLogBackgroundFlush() is
>>>> called.
>>>> For example, if wal_writer_delay is set to several seconds, some
>>>> values in
>>>> pg_stat_wal would be not up-to-date meaninglessly for those seconds.
>>>> So I'm thinking to withdraw my previous comment and it's ok to send
>>>> the stats every after XLogBackgroundFlush() is called. Thought?
>>>
>>> Thanks, I didn't notice that.
>>>
>>> Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
>>> default value is 200msec and it may be set shorter time.
>
> Yeah, if wal_writer_delay is set to very small value, there is a risk
> that the WAL stats are sent too frequently. I agree that's a problem.
>
>>>
>>> Why don't to make another way to check the timestamp?
>>>
>>> +               /*
>>> +                * Don't send a message unless it's been at least
>>> PGSTAT_STAT_INTERVAL
>>> +                * msec since we last sent one
>>> +                */
>>> +               now = GetCurrentTimestamp();
>>> +               if (TimestampDifferenceExceeds(last_report, now,
>>> PGSTAT_STAT_INTERVAL))
>>> +               {
>>> +                       pgstat_send_wal();
>>> +                       last_report = now;
>>> +               }
>>> +
>>>
>>> Although I worried that it's better to add the check code in
>>> pgstat_send_wal(),
>
> Agreed.
>
>>> I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
>>> pgstat_send_wal() is invoked pg_report_stat() and it already checks
>>> the
>>> PGSTAT_STAT_INTERVAL.
>
> I think that we can do that. What about the attached patch?

Thanks, I thought it's better.

>> I forgot to remove an unused variable.
>> The attached v13 patch is fixed.
>
> Thanks for updating the patch!
>
> + w.wal_write,
> + w.wal_write_time,
> + w.wal_sync,
> + w.wal_sync_time,
>
> It's more natural to put wal_write_time and wal_sync_time next to
> each other? That is, what about the following order of columns?
>
> wal_write
> wal_sync
> wal_write_time
> wal_sync_time

Yes, I fixed it.

> - case SYNC_METHOD_OPEN:
> - case SYNC_METHOD_OPEN_DSYNC:
> - /* write synced it already */
> - break;
>
> IMO it's better to add Assert(false) here to ensure that we never reach
> here, as follows. Thought?
>
> + case SYNC_METHOD_OPEN:
> + case SYNC_METHOD_OPEN_DSYNC:
> + /* not reachable */
> + Assert(false);

I agree.

> Even when a backend exits, it sends the stats via
> pgstat_beshutdown_hook().
> On the other hand, walwriter doesn't do that. Walwriter also should
> send
> the stats even at its exit? Otherwise some stats can fail to be
> collected.
> But ISTM that this issue existed from before, for example checkpointer
> doesn't call pgstat_send_bgwriter() at its exit, so it's overkill to
> fix
> this issue in this patch?

Thanks, I thought it's better to do so.
I added the shutdown hook for the walwriter and the checkpointer in
v14-0003 patch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
v14-0001-Add-statistics-related-to-write-sync-wal-records.patch text/x-diff 20.2 KB
v14-0003-Add-shutdown-hooks-to-send-statistics.patch text/x-diff 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumyadeep Chakraborty 2021-03-04 23:42:05 Re: PITR promote bug: Checkpointer writes to older timeline
Previous Message Greg Nancarrow 2021-03-04 23:37:29 Re: Parallel INSERT (INTO ... SELECT ...)