Re: About to add WAL write/fsync statistics to pg_stat_wal view

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Li Japin <japinli(at)hotmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com
Subject: Re: About to add WAL write/fsync statistics to pg_stat_wal view
Date: 2021-03-04 16:02:25
Message-ID: 49d412cc-2e39-a487-1861-a8560ebb88cb@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/03/04 16:14, Masahiro Ikeda wrote:
> On 2021-03-03 20:27, Masahiro Ikeda wrote:
>> On 2021-03-03 16:30, Fujii Masao wrote:
>>> On 2021/03/03 14:33, Masahiro Ikeda wrote:
>>>> On 2021-02-24 16:14, Fujii Masao wrote:
>>>>> On 2021/02/15 11:59, Masahiro Ikeda wrote:
>>>>>> On 2021-02-10 00:51, David G. Johnston wrote:
>>>>>>> On Thu, Feb 4, 2021 at 4:45 PM Masahiro Ikeda
>>>>>>> <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>
>>>>>>>> I pgindented the patches.
>>>>>>>
>>>>>>> ... <function>XLogWrite</function>, which is invoked during an
>>>>>>> <function>XLogFlush</function> request (see ...).  This is also
>>>>>>> incremented by the WAL receiver during replication.
>>>>>>>
>>>>>>> ("which normally called" should be "which is normally called" or
>>>>>>> "which normally is called" if you want to keep true to the original)
>>>>>>> You missed the adding the space before an opening parenthesis here and
>>>>>>> elsewhere (probably copy-paste)
>>>>>>>
>>>>>>> is ether -> is either
>>>>>>> "This parameter is off by default as it will repeatedly query the
>>>>>>> operating system..."
>>>>>>> ", because" -> "as"
>>>>>>
>>>>>> Thanks, I fixed them.
>>>>>>
>>>>>>> wal_write_time and the sync items also need the note: "This is also
>>>>>>> incremented by the WAL receiver during replication."
>>>>>>
>>>>>> I skipped changing it since I separated the stats for the WAL receiver
>>>>>> in pg_stat_wal_receiver.
>>>>>>
>>>>>>> "The number of times it happened..." -> " (the tally of this event is
>>>>>>> reported in wal_buffers_full in....) This is undesirable because ..."
>>>>>>
>>>>>> Thanks, I fixed it.
>>>>>>
>>>>>>> I notice that the patch for WAL receiver doesn't require explicitly
>>>>>>> computing the sync statistics but does require computing the write
>>>>>>> statistics.  This is because of the presence of issue_xlog_fsync but
>>>>>>> absence of an equivalent pg_xlog_pwrite.  Additionally, I observe that
>>>>>>> the XLogWrite code path calls pgstat_report_wait_*() while the WAL
>>>>>>> receiver path does not.  It seems technically straight-forward to
>>>>>>> refactor here to avoid the almost-duplicated logic in the two places,
>>>>>>> though I suspect there may be a trade-off for not adding another
>>>>>>> function call to the stack given the importance of WAL processing
>>>>>>> (though that seems marginalized compared to the cost of actually
>>>>>>> writing the WAL).  Or, as Fujii noted, go the other way and don't have
>>>>>>> any shared code between the two but instead implement the WAL receiver
>>>>>>> one to use pg_stat_wal_receiver instead.  In either case, this
>>>>>>> half-and-half implementation seems undesirable.
>>>>>>
>>>>>> OK, as Fujii-san mentioned, I separated the WAL receiver stats.
>>>>>> (v10-0002-Makes-the-wal-receiver-report-WAL-statistics.patch)
>>>>>
>>>>> Thanks for updating the patches!
>>>>>
>>>>>
>>>>>> I added the infrastructure code to communicate the WAL receiver stats messages between the WAL receiver and the stats collector, and
>>>>>> the stats for WAL receiver is counted in pg_stat_wal_receiver.
>>>>>> What do you think?
>>>>>
>>>>> On second thought, this idea seems not good. Because those stats are
>>>>> collected between multiple walreceivers, but other values in
>>>>> pg_stat_wal_receiver is only related to the walreceiver process running
>>>>> at that moment. IOW, it seems strange that some values show dynamic
>>>>> stats and the others show collected stats, even though they are in
>>>>> the same view pg_stat_wal_receiver. Thought?
>>>>
>>>> OK, I fixed it.
>>>> The stats collected in the WAL receiver is exposed in pg_stat_wal view in v11 patch.
>>>
>>> Thanks for updating the patches! I'm now reading 001 patch.
>>>
>>> +    /* Check whether the WAL file was synced to disk right now */
>>> +    if (enableFsync &&
>>> +        (sync_method == SYNC_METHOD_FSYNC ||
>>> +         sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH ||
>>> +         sync_method == SYNC_METHOD_FDATASYNC))
>>> +    {
>>>
>>> Isn't it better to make issue_xlog_fsync() return immediately
>>> if enableFsync is off, sync_method is open_sync or open_data_sync,
>>> to simplify the code more?
>>
>> Thanks for the comments.
>> I added the above code in v12 patch.
>>
>>>
>>> +        /*
>>> +         * Send WAL statistics only if WalWriterDelay has elapsed to minimize
>>> +         * the overhead in WAL-writing.
>>> +         */
>>> +        if (rc & WL_TIMEOUT)
>>> +            pgstat_send_wal();
>>>
>>> On second thought, this change means that it always takes wal_writer_delay
>>> before walwriter's WAL stats is sent after XLogBackgroundFlush() is called.
>>> For example, if wal_writer_delay is set to several seconds, some values in
>>> pg_stat_wal would be not up-to-date meaninglessly for those seconds.
>>> So I'm thinking to withdraw my previous comment and it's ok to send
>>> the stats every after XLogBackgroundFlush() is called. Thought?
>>
>> Thanks, I didn't notice that.
>>
>> Although PGSTAT_STAT_INTERVAL is 500msec, wal_writer_delay's
>> default value is 200msec and it may be set shorter time.

Yeah, if wal_writer_delay is set to very small value, there is a risk
that the WAL stats are sent too frequently. I agree that's a problem.

>>
>> Why don't to make another way to check the timestamp?
>>
>> +               /*
>> +                * Don't send a message unless it's been at least
>> PGSTAT_STAT_INTERVAL
>> +                * msec since we last sent one
>> +                */
>> +               now = GetCurrentTimestamp();
>> +               if (TimestampDifferenceExceeds(last_report, now,
>> PGSTAT_STAT_INTERVAL))
>> +               {
>> +                       pgstat_send_wal();
>> +                       last_report = now;
>> +               }
>> +
>>
>> Although I worried that it's better to add the check code in pgstat_send_wal(),

Agreed.

>> I didn't do so because to avoid to double check PGSTAT_STAT_INTERVAL.
>> pgstat_send_wal() is invoked pg_report_stat() and it already checks the
>> PGSTAT_STAT_INTERVAL.

I think that we can do that. What about the attached patch?

> I forgot to remove an unused variable.
> The attached v13 patch is fixed.

Thanks for updating the patch!

+ w.wal_write,
+ w.wal_write_time,
+ w.wal_sync,
+ w.wal_sync_time,

It's more natural to put wal_write_time and wal_sync_time next to
each other? That is, what about the following order of columns?

wal_write
wal_sync
wal_write_time
wal_sync_time

- case SYNC_METHOD_OPEN:
- case SYNC_METHOD_OPEN_DSYNC:
- /* write synced it already */
- break;

IMO it's better to add Assert(false) here to ensure that we never reach
here, as follows. Thought?

+ case SYNC_METHOD_OPEN:
+ case SYNC_METHOD_OPEN_DSYNC:
+ /* not reachable */
+ Assert(false);

Even when a backend exits, it sends the stats via pgstat_beshutdown_hook().
On the other hand, walwriter doesn't do that. Walwriter also should send
the stats even at its exit? Otherwise some stats can fail to be collected.
But ISTM that this issue existed from before, for example checkpointer
doesn't call pgstat_send_bgwriter() at its exit, so it's overkill to fix
this issue in this patch?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v13-0001-Add-statistics-related-to-write-sync-wal-records_fujii.patch text/plain 21.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2021-03-04 16:03:11 Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Previous Message Tom Lane 2021-03-04 15:40:31 Re: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]