On replicas, checkpoints_req is incremented even if restartpoint has not happened

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: On replicas, checkpoints_req is incremented even if restartpoint has not happened
Date: 2019-04-12 13:43:40
Message-ID: CANNMO+LnpOQe2viTAsRzpajruY1MASuFHs0=EznCyNoeYx6ysA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Observing pg_stat_bgwriter on replicas, I've found that the checkpoints_req
counter is incremented much quicker than restartpoints happen according
logs.

Example:

select * from pg_stat_bgwriter ; \watch 60

Fri 12 Apr 2019 01:16:26 PM UTC (every 60s)

checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255224
(1 row)

Fri 12 Apr 2019 01:17:26 PM UTC (every 60s)

checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255240
(1 row)

Fri 12 Apr 2019 01:18:26 PM UTC (every 60s)

checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255291
(1 row)

Fri 12 Apr 2019 01:19:26 PM UTC (every 60s)

checkpoints_timed | checkpoints_req
-------------------+-----------------
2200 | 255323
(1 row)

– the counter is increasing by ~20-50 per minute.

At the same time, from logs on the same server we can learn that
restartpoints happen only 1-2 times per minute:

$ sudo journalctl --since '2019-04-12 13:15' | grep "restart point" | awk
-F'[: ]' '{print $1" "$2" "$3":"$4}' | uniq -c
1 Apr 12 13:16
2 Apr 12 13:18
1 Apr 12 13:19
1 Apr 12 13:20

Checking the source code:
- checkpoints_req is incremented here:
https://gitlab.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/postmaster/checkpointer.c#L404
- later, it might turn out that restartpoint wasn't really performed in
current cycle:
https://gitlab.com/postgres/postgres/blob/REL9_6_STABLE/src/backend/postmaster/checkpointer.c#L518
-- but the counter has been already incremented.

So reading this code I guess we might have the problem with checkpoints_req
on the master as well, counting "fullfledged" checkpoints. If checkpoint
attempt has failed, the counter is incremented already as well. So it looks
like attempts of checkpoints are being counted. However, documentation
defines checkpoints_req as "Number of requested checkpoints that have been
performed" (https://www.postgresql.org/docs/9.6/monitoring-stats.html)

The master code looks similar, so this problem should not be only with 9.6.
but just in case:

# select version();
version
-------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.6.11 on x86_64-pc-linux-gnu (Ubuntu 9.6.11-1.pgdg16.04+1),
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Thanks,
Nik

Browse pgsql-bugs by date

  From Date Subject
Next Message Sergei Kornilov 2019-04-12 13:44:37 Re: BUG #15749: ERROR: dsa_area could not attach to segment | FATAL: cannot unpin a segment that is not pinned
Previous Message PG Bug reporting form 2019-04-12 13:38:22 BUG #15749: ERROR: dsa_area could not attach to segment | FATAL: cannot unpin a segment that is not pinned