From: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica. |
Date: | 2024-09-16 14:30:35 |
Message-ID: | 77032579-4dc3-4552-9a09-30aaa114c144@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
On 13.09.2024 18:20, Fujii Masao wrote:
>
> If I understand correctly, restartpoints_timed and restartpoints_done were
> separated because a restartpoint can be skipped. restartpoints_timed counts
> when a restartpoint is triggered by a timeout, whether it runs or not,
> while restartpoints_done only tracks completed restartpoints.
>
> Similarly, I believe checkpoints should be handled the same way.
> Checkpoints can also be skipped when the system is idle, but currently,
> num_timed counts even the skipped ones, despite its documentation stating
> it's the "Number of scheduled checkpoints that have been performed."
>
> Why not separate num_timed into something like checkpoints_timed and
> checkpoints_done to reflect these different counters?
+1
This idea seems quite tenable to me.
There is a small clarification. Now if there were no skipped restartpoints then
restartpoints_done will be equal to restartpoints_timed + restartpoints_req.
Similar for checkpoints.
So i tried to introduce num_done counter for checkpoints in the patch attached.
I'm not sure should we include testing for the case when num_done is less than
num_timed + num_requested to the regress tests. I haven't been able to get it in a short time yet.
E.g. such a case may be obtained when an a error "checkpoints are
occurring too frequently" as follows:
-set checkpoint_timeout = 30 and checkpoint_warning = 40 in the postgresql.conf
-start server
-do periodically bulk insertions in the 1st client (e.g. insert into test values (generate_series(1,1E7));)
-watch for pg_stat_checkpointer in the 2nd one:
# SELECT CURRENT_TIME; select * from pg_stat_checkpointer;
# \watch
After some time, in the log will appear:
2024-09-16 16:38:47.888 MSK [193733] LOG: checkpoints are occurring too frequently (13 seconds apart)
2024-09-16 16:38:47.888 MSK [193733] HINT: Consider increasing the configuration parameter "max_wal_size".
And num_timed + num_requested will become greater than num_done.
Would be nice to find some simpler and faster way.
With the best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Introduce-num_done-counter-in-the-pg_stat_checkpointer.patch | text/x-patch | 10.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2024-09-16 14:33:20 | Re: Add contrib/pg_logicalsnapinspect |
Previous Message | Alvaro Herrera | 2024-09-16 14:22:03 | Re: Psql meta-command conninfo+ |