checkpoint process is hang

From: 石勇虎 <SHIYONGHU651(at)pingan(dot)com(dot)cn>
To: "'pgsql-bugs(at)postgresql(dot)org'" <pgsql-bugs(at)postgresql(dot)org>
Cc: 熊浪 <XIONGLANG619(at)pingan(dot)com(dot)cn>, 栾长苗 <LUANCHANGMIAO531(at)pingan(dot)com(dot)cn>
Subject: checkpoint process is hang
Date: 2019-10-18 02:07:12
Message-ID: 0ee64a6b5bda4e55b769ac6de40af9b0@pingan.com.cn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

发件人: 熊浪
发送时间: 2019年10月18日 9:51
收件人: 石勇虎 <SHIYONGHU651(at)pingan(dot)com(dot)cn>
主题: 答复: checkpoint process is hang

Hello~
I had a database which the last checkpoint time was Sep 29 14:25:49. And now the checkpoint is still hang.The postgresql’s version is 10.3.The host‘s version is Red Hat Enterprise Linux Server release 6.8 (Santiago)。

2019-09-29 14:20:46.157 CST,,,37651,,5d8a3322.9313,2665,,2019-09-24 23:15:46 CST,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""
2019-09-29 14:24:16.084 CST,,,37651,,5d8a3322.9313,2666,,2019-09-24 23:15:46 CST,,0,LOG,00000,"checkpoint complete: wrote 697421 buffers (13.3%); 0 WAL file(s) added, 0 removed, 3 recycled; write=209.903 s, sync=0.014 s, total=209.927 s; sync files=130, longest=0.001 s, average=0.000 s; distance=4959020 kB, estimate=5181294 kB",,,,,,,,,""
2019-09-29 14:25:46.179 CST,,,37651,,5d8a3322.9313,2667,,2019-09-24 23:15:46 CST,,0,LOG,00000,"checkpoint starting: time",,,,,,,,,""

The database has long transaction in pg_stat_activity.
[cid:image002(dot)png(at)01D58599(dot)7375FB20]

The download is database process status. A lot of old process did not normal stoped.

Pgagent status is 110.
Timing is on.
PG is in recovery: false

Time: 1.276 ms
postgres 1610 37573 0 Sep27 ? 00:00:36 postgres: pgvicco: bgworker: parallel worker for PID 39675
postgres 1611 37573 0 Sep27 ? 00:00:36 postgres: pgvicco: bgworker: parallel worker for PID 39675
postgres 2780 37573 0 Oct08 ? 00:00:26 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 3679 37573 0 Sep29 ? 00:00:04 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 4119 37573 0 Oct05 ? 00:07:04 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 5130 37573 0 Sep29 ? 00:00:04 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 7309 37573 0 Sep27 ? 00:00:54 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 8549 37573 0 Sep29 ? 00:00:00 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 8804 37573 0 Sep29 ? 00:00:07 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 12061 37573 0 Oct08 ? 00:00:29 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 12627 37573 0 Sep27 ? 00:00:42 postgres: pgvicco: bgworker: parallel worker for PID 7309
postgres 12628 37573 0 Sep27 ? 00:00:42 postgres: pgvicco: bgworker: parallel worker for PID 7309
postgres 13610 37573 0 Oct08 ? 00:00:05 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 13925 37573 0 Oct08 ? 00:02:36 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 14275 37573 0 Sep30 ? 00:02:50 postgres: pgvicco: autovacuum worker process pgvicco
postgres 15691 37573 0 Oct09 ? 00:00:50 postgres: pgvicco: wal sender process repuser sending backup "pg_basebackup base backup"
postgres 16662 37573 0 Oct10 ? 00:01:55 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 16776 37573 0 Oct08 ? 00:00:00 postgres: pgvicco: lbdpimssqp pgvicco DELETE
postgres 18486 37573 0 Oct09 ? 00:00:14 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 19094 37573 0 Sep29 ? 00:00:04 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 19355 37573 0 Oct09 ? 00:34:30 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 20080 37573 0 Oct09 ? 00:01:48 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 21420 37573 0 Sep29 ? 00:00:03 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 21473 37573 0 Oct12 ? 00:00:18 postgres: pgvicco: lbdpimssqp pgvicco SELECT
postgres 21475 37573 0 Oct12 ? 00:00:19 postgres: pgvicco: lbdpimssqp pgvicco SELECT
postgres 21478 37573 0 Oct12 ? 00:00:18 postgres: pgvicco: lbdpimssqp pgvicco SELECT
postgres 21482 37573 0 Oct12 ? 00:00:18 postgres: pgvicco: lbdpimssqp pgvicco SELECT
postgres 22955 37573 0 Oct10 ? 00:03:45 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 24551 37573 0 Sep29 ? 00:00:35 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 24729 37573 0 Oct05 ? 00:06:24 postgres: pgvicco: autovacuum worker process pgvicco
postgres 26663 37573 0 Sep29 ? 00:00:00 postgres: pgvicco: bgworker: parallel worker for PID 21420
postgres 26664 37573 0 Sep29 ? 00:00:00 postgres: pgvicco: bgworker: parallel worker for PID 21420
postgres 26689 37573 0 Sep24 ? 00:00:02 postgres: pgvicco: bgworker: pglogical manager 17597
postgres 26804 37573 0 Sep29 ? 00:00:16 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 27077 37573 0 Oct09 ? 00:00:08 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 27882 37573 0 Oct09 ? 00:02:14 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 27894 37573 0 Oct05 ? 00:00:08 postgres: pgvicco: pgviccodata pgvicco DELETE
postgres 28468 37573 0 Oct09 ? 00:06:02 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 28496 37573 0 Oct05 ? 00:00:08 postgres: pgvicco: pgviccodata pgvicco DELETE
postgres 29044 37573 0 Oct10 ? 00:00:16 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 29444 37573 0 Oct08 ? 00:00:05 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 29717 37573 0 Sep29 ? 00:03:05 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 29788 37573 0 Oct05 ? 00:00:11 postgres: pgvicco: lbdpimssqp pgvicco DELETE
postgres 31267 37573 0 Sep30 ? 00:00:00 postgres: pgvicco: lbdpimssqp pgvicco DELETE
postgres 33786 37573 0 Sep29 ? 00:00:10 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 35747 37573 0 Sep29 ? 00:00:00 postgres: pgvicco: bgworker: parallel worker for PID 37579
postgres 35748 37573 0 Sep29 ? 00:00:00 postgres: pgvicco: bgworker: parallel worker for PID 37579
postgres 37369 37573 0 Oct10 ? 00:00:38 postgres: pgvicco: devsup01 pgvicco SELECT
postgres 37573 1 0 Sep24 ? 00:02:39 /paic/postgres/base/10.3/bin/postgres -D /paic/pgxxxx/data
postgres 37579 37573 0 Sep29 ? 00:00:12 postgres: pgvicco: devsup01 pgvicco xxxx SELECT
postgres 37580 37573 0 Sep24 ? 00:02:38 postgres: pgvicco: logger process
postgres 37651 37573 0 Sep24 ? 00:20:02 postgres: pgvicco: checkpointer process
postgres 37652 37573 0 Sep24 ? 01:01:09 postgres: pgvicco: writer process
postgres 37653 37573 0 Sep24 ? 00:12:25 postgres: pgvicco: wal writer process
postgres 37654 37573 0 Sep24 ? 00:00:18 postgres: pgvicco: autovacuum launcher process
postgres 37655 37573 0 Sep24 ? 00:00:15 postgres: pgvicco: archiver process last was 0000000100002FB200000003
postgres 37656 37573 1 Sep24 ? 05:18:27 postgres: pgvicco: stats collector process
postgres 37657 37573 0 Sep24 ? 00:00:00 postgres: pgvicco: bgworker: pglogical supervisor
postgres 37659 37573 0 Sep24 ? 00:00:01 postgres: pgvicco: bgworker: logical replication launcher
postgres 38167 37573 0 Oct09 ? 00:00:24 postgres: pgvicco: devsup01 pgvicco xxxx SELECT
postgres 39558 37573 0 Sep29 ? 00:00:10 postgres: pgvicco: lbdpimssqp pgvicco xxxx(xxx) INSERT
postgres 39675 37573 0 Sep27 ? 00:00:36 postgres: pgvicco: devsup01 pgvicco xxx (xxx) SELECT
postgres 39927 37573 0 Oct05 ? 00:06:58 postgres: pgvicco: autovacuum worker process pgvicco

1, We’d try command checkpoint,but It’s timeout.Strace the command ,checkpoint will send the kill sign to checkpoint process,which is timeout.
2, We’d find a bug ,the parallel query will make the query hang.
https://www.postgresql.org/message-id/flat/14580491539874286%40myt5-96980ae04900.qloud-c.yandex.net#b81db528ef322e1b153deaccbcfd03fc

For a database the checkpoint is so important .So I want to know why the checkpoint hangs? Did the had relation on the bug?

The stack of the checkpoint.
[postgres(at)xxxx:pgvicco:xxx ~]$ gstack 37651
#0 0x0000003bc480da00 in sem_wait () from /lib64/libpthread.so.0
#1 0x000000000069d890 in PGSemaphoreLock ()
#2 0x000000000070c3a5 in LWLockAcquire ()
#3 0x00000000006ee553 in SyncOneBuffer ()
#4 0x00000000006ee8d9 in BufferSync ()
#5 0x00000000006ee966 in CheckPointBuffers ()
#6 0x00000000004ea2cc in CheckPointGuts ()
#7 0x00000000004f2456 in CreateCheckPoint ()
#8 0x00000000006a4152 in CheckpointerMain ()
#9 0x00000000005020ca in AuxiliaryProcessMain ()
#10 0x00000000006ac127 in StartChildProcess ()
#11 0x00000000006ae73a in reaper ()
#12 <signal handler called>
#13 0x0000003bc44e1503 in __select_nocancel () from /lib64/libc.so.6
#14 0x00000000006b03fa in PostmasterMain ()
#15 0x0000000000631e70 in main ()

********************************************************************************************************************************
The information in this email is confidential and may be legally privileged. If you have received this email in error or are not the intended recipient, please immediately notify the sender and delete this message from your computer. Any use, distribution, or copying of this email other than by the intended recipient is strictly prohibited. All messages sent to and from us may be monitored to ensure compliance with internal policies and to protect our business.
Emails are not secure and cannot be guaranteed to be error free as they can be intercepted, amended, lost or destroyed, or contain viruses. Anyone who communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含涉密信息,请保守秘密,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
********************************************************************************************************************************

Browse pgsql-bugs by date

  From Date Subject
Next Message Gavin M2301 2019-10-18 02:27:45 Checking types for arguments
Previous Message Yudhveer Kandukuri 2019-10-18 00:34:05 Re: BUG #16064: Ldap Authentication failing with pg_hba.conf entry