Re: New statistics for tuning WAL buffer size

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, tsunakawa(dot)takay(at)fujitsu(dot)com, Magnus Hagander <magnus(at)hagander(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: New statistics for tuning WAL buffer size
Date: 2020-09-18 00:40:11
Message-ID: 25c55c57ca9af60707bff6667307a3c7@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-09-15 17:10, Fujii Masao wrote:
> On 2020/09/15 15:52, Masahiro Ikeda wrote:
>> On 2020-09-11 01:40, Fujii Masao wrote:
>>> On 2020/09/09 13:57, Masahiro Ikeda wrote:
>>>> On 2020-09-07 16:19, Fujii Masao wrote:
>>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote:
>>>>>> Thanks for the review and advice!
>>>>>>
>>>>>> On 2020-09-03 16:05, Fujii Masao wrote:
>>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote:
>>>>>>>>> +/* ----------
>>>>>>>>> + * Backend types
>>>>>>>>> + * ----------
>>>>>>>>>
>>>>>>>>> You seem to forget to add "*/" into the above comment.
>>>>>>>>> This issue could cause the following compiler warning.
>>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block
>>>>>>>>> comment [-Wcomment]
>>>>>>>>
>>>>>>>> Thanks for the comment. I fixed.
>>>>>>>
>>>>>>> Thanks for the fix! But why are those comments necessary?
>>>>>>
>>>>>> Sorry about that. This comment is not necessary.
>>>>>> I removed it.
>>>>>>
>>>>>>>> The pg_stat_walwriter is not security restricted now, so
>>>>>>>> ordinary users can access it.
>>>>>>>> It has the same security level as pg_stat_archiver. If you have
>>>>>>>> any comments, please let me know.
>>>>>>>
>>>>>>> +       <structfield>dirty_writes</structfield>
>>>>>>> <type>bigint</type>
>>>>>>>
>>>>>>> I guess that the column name "dirty_writes" derived from
>>>>>>> the DTrace probe name. Isn't this name confusing? We should
>>>>>>> rename it to "wal_buffers_full" or something?
>>>>>>
>>>>>> I agree and rename it to "wal_buffers_full".
>>>>>>
>>>>>>> +/* ----------
>>>>>>> + * PgStat_MsgWalWriter            Sent by the walwriter to
>>>>>>> update statistics.
>>>>>>>
>>>>>>> This comment seems not accurate because backends also send it.
>>>>>>>
>>>>>>> +/*
>>>>>>> + * WAL writes statistics counter is updated in XLogWrite
>>>>>>> function
>>>>>>> + */
>>>>>>> +extern PgStat_MsgWalWriter WalWriterStats;
>>>>>>>
>>>>>>> This comment seems not right because the counter is not updated
>>>>>>> in XLogWrite().
>>>>>>
>>>>>> Right. I fixed it to "Sent by each backend and background workers
>>>>>> to update WAL statistics."
>>>>>> In the future, other statistics will be included so I remove the
>>>>>> function's name.
>>>>>>
>>>>>>
>>>>>>> +-- There will surely and maximum one record
>>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter;
>>>>>>>
>>>>>>> What about changing this comment to "There must be only one
>>>>>>> record"?
>>>>>>
>>>>>> Thanks, I fixed.
>>>>>>
>>>>>>> +                    WalWriterStats.m_xlog_dirty_writes++;
>>>>>>>                      LWLockRelease(WALWriteLock);
>>>>>>>
>>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be
>>>>>>> protected
>>>>>>> with WALWriteLock, isn't it better to increment that after
>>>>>>> releasing the lock?
>>>>>>
>>>>>> Thanks, I fixed.
>>>>>>
>>>>>>> +CREATE VIEW pg_stat_walwriter AS
>>>>>>> +    SELECT
>>>>>>> +        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
>>>>>>> +        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
>>>>>>> +
>>>>>>>  CREATE VIEW pg_stat_progress_vacuum AS
>>>>>>>
>>>>>>> In system_views.sql, the definition of pg_stat_walwriter should
>>>>>>> be
>>>>>>> placed just after that of pg_stat_bgwriter not
>>>>>>> pg_stat_progress_analyze.
>>>>>>
>>>>>> OK, I fixed it.
>>>>>>
>>>>>>>      }
>>>>>>> -
>>>>>>>      /*
>>>>>>>       * We found an existing collector stats file. Read it and
>>>>>>> put all the
>>>>>>>
>>>>>>> You seem to accidentally have removed the empty line here.
>>>>>>
>>>>>> Sorry about that. I fixed it.
>>>>>>
>>>>>>> -                 errhint("Target must be \"archiver\" or
>>>>>>> \"bgwriter\".")));
>>>>>>> +                 errhint("Target must be \"archiver\" or
>>>>>>> \"bgwriter\" or
>>>>>>> \"walwriter\".")));
>>>>>>>
>>>>>>> There are two "or" in the message, but the former should be
>>>>>>> replaced with ","?
>>>>>>
>>>>>> Thanks, I fixed.
>>>>>>
>>>>>>
>>>>>> On 2020-09-05 18:40, Magnus Hagander wrote:
>>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
>>>>>>> <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>>>>
>>>>>>>> On 2020/09/04 11:50, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
>>>>>>>>> From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
>>>>>>>>>>> I changed the view name from pg_stat_walwrites to
>>>>>>>> pg_stat_walwriter.
>>>>>>>>>>> I think it is better to match naming scheme with other views
>>>>>>>> like
>>>>>>>>>> pg_stat_bgwriter,
>>>>>>>>>>> which is for bgwriter statistics but it has the statistics
>>>>>>>> related to backend.
>>>>>>>>>>
>>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency
>>>>>>>>>> with
>>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which
>>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for
>>>>>>>>>> the consistency. Thought? IMO either of them works for me.
>>>>>>>>>> I'd like to hear more opinons about this.
>>>>>>>>>
>>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains
>>>>>>>> the backends' activity.  Likewise, pg_stat_walwriter leads to
>>>>>>>> misunderstanding because its information is not limited to WAL
>>>>>>>> writer.
>>>>>>>>>
>>>>>>>>> How about simply pg_stat_wal?  In the future, we may want to
>>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap.
>>>>>>>>
>>>>>>>> Sounds reasonable.
>>>>>>>
>>>>>>> +1.
>>>>>>>
>>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now
>>>>>>> --
>>>>>>> it became even more apparent when the checkpointer was split out
>>>>>>> to
>>>>>>> it's own process, and that's not exactly a recent change. And it
>>>>>>> had
>>>>>>> allocs in it from day one...
>>>>>>>
>>>>>>> I think naming it for what the data in it is ("wal") rather than
>>>>>>> which
>>>>>>> process deals with it ("walwriter") is correct, unless the
>>>>>>> statistics
>>>>>>> can be known to only *ever* affect one type of process. (And then
>>>>>>> different processes can affect different columns in the view). As
>>>>>>> a
>>>>>>> general rule -- and that's from what I can tell exactly what's
>>>>>>> being
>>>>>>> proposed.
>>>>>>
>>>>>> Thanks for your comments. I agree with your opinions.
>>>>>> I changed the view name to "pg_stat_wal".
>>>>>>
>>>>>>
>>>>>> I fixed the code to send the WAL statistics from not only backend
>>>>>> and walwriter
>>>>>> but also checkpointer, walsender and autovacuum worker.
>>>>>
>>>>> Good point! Thanks for updating the patch!
>>>>>
>>>>>
>>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams
>>>>> *params,
>>>>>                           onerel->rd_rel->relisshared,
>>>>>                           Max(new_live_tuples, 0),
>>>>>                           vacrelstats->new_dead_tuples);
>>>>> +    pgstat_send_wal();
>>>>>
>>>>> I guess that you changed heap_vacuum_rel() as above so that
>>>>> autovacuum
>>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by
>>>>> the processes (e.g., backends) other than autovacuum workers? Also
>>>>> what happens if autovacuum workers just do ANALYZE only? In that
>>>>> case,
>>>>> heap_vacuum_rel() may not be called.
>>>>>
>>>>> Currently autovacuum worker reports the stats at the exit via
>>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
>>>>> is not the process that basically keeps running during the service.
>>>>> It exits
>>>>> after it does vacuum or analyze. So ISTM that it's not bad to
>>>>> report the stats
>>>>> only at the exit, in autovacuum worker case. There is no need to
>>>>> add extra
>>>>> code for WAL stats report by autovacuum worker. Thought?
>>>>
>>>> Thanks, I understood. I removed this code.
>>>>
>>>>>
>>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
>>>>>          else
>>>>>              RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>>>>>  +        /* Send wal statistics */
>>>>> +        pgstat_send_wal();
>>>>>
>>>>> AFAIR logical walsender uses three loops in WalSndLoop(),
>>>>> WalSndWriteData()
>>>>> and WalSndWaitForWal(). But could you tell me why added
>>>>> pgstat_send_wal()
>>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is
>>>>> the best
>>>>> for that purpose.
>>>>
>>>> I checked what function calls XLogBackgroundFlush() which calls
>>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full.
>>>>
>>>> I found that WalSndWaitForWal() calls it, so I added it.
>>>
>>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
>>> second argument opportunistic=true, so in this case WAL write by
>>> wal_buffers full seems to never happen. Right? If this understanding
>>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
>>> Probably also walwriter doesn't need to do that.
>
> Thanks for updating the patch! This patch adds pgstat_send_wal() in
> walwriter main loop. But isn't this unnecessary because of the above
> reason?
> That is, since walwriter calls AdvanceXLInsertBuffer() with
> the second argument "opportunistic" = true via XLogBackgroundFlush(),
> the event of full wal_buffers will never happen. No?

Right, I fixed it.

>>>
>>> The logical rep walsender can generate WAL and call
>>> AdvanceXLInsertBuffer() when it executes the replication commands
>>> like
>>> CREATE_REPLICATION_SLOT. But this case is already covered by
>>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
>>> with your patch. So no more calls to pgstat_send_wal() seems
>>> necessary
>>> for logical rep walsender.
>>
>> Thanks for your reviews. I didn't notice that.
>> I updated the patches.
>
> Sorry, the above my analysis might be incorrect. During logical
> replication,
> walsender may access to the system table. Which may cause HOT pruning
> or killing of dead index tuple. Also which can cause WAL and
> full wal_buffers event. Thought?

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

WalSndLoop()
-> XLogSendLogical()
-> LogicalDecodingProcessRecord()
-> DecodeXactOp()
-> DecodeCommit()
-> ReorderBufferCommit()
-> ReorderBufferProcessTXN()
-> RelidByRelfilenode()
-> systable_getnext()

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

Is it necessary to introduce a mechanism to send in bulk?
But I worried about how to implement is best. Is it good to send wal
statistics per X recoreds?

I think there are other background processes that access the system
tables,
so I organized which process must send wal metrics and added
pgstat_send_wal() to the main loop of some background processes
for example, autovacuum launcher, logical replication launcher, and
logical replication worker's one.

(*) [x]: it needs to send it
[ ]: it don't need to send it

* [ ] postmaster
* [ ] background writer
* [x] checkpointer: it generates wal for checkpoint.
* [ ] walwriter
* [x] autovacuum launcher: it accesses to the system tables to get the
database list.
* [x] autovacuum worker: it generates wal for vacuum.
* [ ] stats collector
* [x] backend: it generate wal for query execution.
* [ ] startup
* [ ] archiver
* [x] walsender: it accesses to the system tables if logical replication
is performed.
* [ ] walreceiver
* [x] logical replication launcher: it accesses to the system tables to
get the subscription list.
* [x] logical replication worker: it accesses to the system tables to
get oid from relname.
* [x] parallel worker: it generates wal for query execution.

If my understanding is wrong, please let me know.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachment Content-Type Size
0007_pg_stat_wal_view.patch text/x-diff 20.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-09-18 02:11:51 Re: New statistics for tuning WAL buffer size
Previous Message Thomas Munro 2020-09-18 00:30:01 Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)