Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Date: 2022-03-19 00:15:56
Message-ID: 20220319001556.atuqbmafvc73pa2k@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

This is a long thread, sorry for asking if this has been asked before.

On 2022-03-08 20:25:28 +0530, Nitin Jadhav wrote:
> * Sort buffers that need to be written to reduce the likelihood of random
> @@ -2129,6 +2132,8 @@ BufferSync(int flags)
> bufHdr = GetBufferDescriptor(buf_id);
>
> num_processed++;
> + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_PROCESSED,
> + num_processed);
>
> /*
> * We don't need to acquire the lock here, because we're only looking
> @@ -2149,6 +2154,8 @@ BufferSync(int flags)
> TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> PendingCheckpointerStats.m_buf_written_checkpoints++;
> num_written++;
> + pgstat_progress_update_param(PROGRESS_CHECKPOINT_BUFFERS_WRITTEN,
> + num_written);
> }
> }

Have you measured the performance effects of this? On fast storage with large
shared_buffers I've seen these loops in profiles. It's probably fine, but it'd
be good to verify that.

> @@ -1897,6 +1897,112 @@ pg_stat_progress_basebackup| SELECT s.pid,
> s.param4 AS tablespaces_total,
> s.param5 AS tablespaces_streamed
> FROM pg_stat_get_progress_info('BASEBACKUP'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
> +pg_stat_progress_checkpoint| SELECT s.pid,
> + CASE s.param1
> + WHEN 1 THEN 'checkpoint'::text
> + WHEN 2 THEN 'restartpoint'::text
> + ELSE NULL::text
> + END AS type,
> + (((((((
> + CASE
> + WHEN ((s.param2 & (1)::bigint) > 0) THEN 'shutdown '::text
> + ELSE ''::text
> + END ||
> + CASE
> + WHEN ((s.param2 & (2)::bigint) > 0) THEN 'end-of-recovery '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (4)::bigint) > 0) THEN 'immediate '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (8)::bigint) > 0) THEN 'force '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (16)::bigint) > 0) THEN 'flush-all '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (32)::bigint) > 0) THEN 'wait '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (128)::bigint) > 0) THEN 'wal '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param2 & (256)::bigint) > 0) THEN 'time '::text
> + ELSE ''::text
> + END) AS flags,
> + (((((((
> + CASE
> + WHEN ((s.param3 & (1)::bigint) > 0) THEN 'shutdown '::text
> + ELSE ''::text
> + END ||
> + CASE
> + WHEN ((s.param3 & (2)::bigint) > 0) THEN 'end-of-recovery '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (4)::bigint) > 0) THEN 'immediate '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (8)::bigint) > 0) THEN 'force '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (16)::bigint) > 0) THEN 'flush-all '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (32)::bigint) > 0) THEN 'wait '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (128)::bigint) > 0) THEN 'wal '::text
> + ELSE ''::text
> + END) ||
> + CASE
> + WHEN ((s.param3 & (256)::bigint) > 0) THEN 'time '::text
> + ELSE ''::text
> + END) AS next_flags,
> + ('0/0'::pg_lsn + (
> + CASE
> + WHEN (s.param4 < 0) THEN pow((2)::numeric, (64)::numeric)
> + ELSE (0)::numeric
> + END + (s.param4)::numeric)) AS start_lsn,
> + to_timestamp(((946684800)::double precision + ((s.param5)::double precision / (1000000)::double precision))) AS start_time,
> + CASE s.param6
> + WHEN 1 THEN 'initializing'::text
> + WHEN 2 THEN 'getting virtual transaction IDs'::text
> + WHEN 3 THEN 'checkpointing replication slots'::text
> + WHEN 4 THEN 'checkpointing logical replication snapshot files'::text
> + WHEN 5 THEN 'checkpointing logical rewrite mapping files'::text
> + WHEN 6 THEN 'checkpointing replication origin'::text
> + WHEN 7 THEN 'checkpointing commit log pages'::text
> + WHEN 8 THEN 'checkpointing commit time stamp pages'::text
> + WHEN 9 THEN 'checkpointing subtransaction pages'::text
> + WHEN 10 THEN 'checkpointing multixact pages'::text
> + WHEN 11 THEN 'checkpointing predicate lock pages'::text
> + WHEN 12 THEN 'checkpointing buffers'::text
> + WHEN 13 THEN 'processing file sync requests'::text
> + WHEN 14 THEN 'performing two phase checkpoint'::text
> + WHEN 15 THEN 'performing post checkpoint cleanup'::text
> + WHEN 16 THEN 'invalidating replication slots'::text
> + WHEN 17 THEN 'recycling old WAL files'::text
> + WHEN 18 THEN 'truncating subtransactions'::text
> + WHEN 19 THEN 'finalizing'::text
> + ELSE NULL::text
> + END AS phase,
> + s.param7 AS buffers_total,
> + s.param8 AS buffers_processed,
> + s.param9 AS buffers_written,
> + s.param10 AS files_total,
> + s.param11 AS files_synced
> + FROM pg_stat_get_progress_info('CHECKPOINT'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20);
> pg_stat_progress_cluster| SELECT s.pid,
> s.datid,
> d.datname,

This view is depressingly complicated. Added up the view definitions for
the already existing pg_stat_progress* views add up to a measurable part of
the size of an empty database:

postgres[1160866][1]=# SELECT sum(octet_length(ev_action)), SUM(pg_column_size(ev_action)) FROM pg_rewrite WHERE ev_class::regclass::text LIKE '%progress%';
┌───────┬───────┐
│ sum │ sum │
├───────┼───────┤
│ 97410 │ 19786 │
└───────┴───────┘
(1 row)

and this view looks to be a good bit more complicated than the existing
pg_stat_progress* views.

Indeed:
template1[1165473][1]=# SELECT ev_class::regclass, length(ev_action), pg_column_size(ev_action) FROM pg_rewrite WHERE ev_class::regclass::text LIKE '%progress%' ORDER BY length(ev_action) DESC;
┌───────────────────────────────┬────────┬────────────────┐
│ ev_class │ length │ pg_column_size │
├───────────────────────────────┼────────┼────────────────┤
│ pg_stat_progress_checkpoint │ 43290 │ 5409 │
│ pg_stat_progress_create_index │ 23293 │ 4177 │
│ pg_stat_progress_cluster │ 18390 │ 3704 │
│ pg_stat_progress_analyze │ 16121 │ 3339 │
│ pg_stat_progress_vacuum │ 16076 │ 3392 │
│ pg_stat_progress_copy │ 15124 │ 3080 │
│ pg_stat_progress_basebackup │ 8406 │ 2094 │
└───────────────────────────────┴────────┴────────────────┘
(7 rows)

pg_rewrite without pg_stat_progress_checkpoint: 745472, with: 753664

pg_rewrite is the second biggest relation in an empty database already...

template1[1164827][1]=# SELECT relname, pg_total_relation_size(oid) FROM pg_class WHERE relkind = 'r' ORDER BY 2 DESC LIMIT 5;
┌────────────────┬────────────────────────┐
│ relname │ pg_total_relation_size │
├────────────────┼────────────────────────┤
│ pg_proc │ 1212416 │
│ pg_rewrite │ 745472 │
│ pg_attribute │ 704512 │
│ pg_description │ 630784 │
│ pg_collation │ 409600 │
└────────────────┴────────────────────────┘
(5 rows)

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imseih (AWS), Sami 2022-03-19 00:57:23 Re: Allow parallel plan for referential integrity checks?
Previous Message Andres Freund 2022-03-18 23:48:28 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats