BUG #16414: Backend process blocked for 120s until kernel killed it

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: montanalow(at)gmail(dot)com
Subject: BUG #16414: Backend process blocked for 120s until kernel killed it
Date: 2020-05-04 03:17:54
Message-ID: 16414-1eddf92b8ac2ed0a@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16414
Logged by: Montana Low
Email address: montanalow(at)gmail(dot)com
PostgreSQL version: 12.2
Operating system: Ubuntu 18.04.3 LTS
Description:

A postgres server became unresponsive until reboot. This machine (and
several others in the cluster) typically handle bulk data ingestion via
COPY. Before the crash, there were many entries in the postgres logs similar
to:

2020-05-03 14:41:25.341 PDT [74797] ERROR: COPY from stdin failed: COPY
terminated by new PQexec
2020-05-03 14:41:25.344 PDT [87084] ERROR: COPY from stdin failed: COPY
terminated by new PQexec
...
2020-05-03 14:42:20.577 PDT [89620] ERROR: COPY from stdin failed: COPY
terminated by new PQexec

There was 1 statement timeout (set to 5 minutes) as the final log entry
before I manually initiated a server reboot.

2020-05-03 14:44:38.252 PDT [69974] ERROR: canceling statement due to
statement timeout

There were several interesting entries in journald, similar to this one.

May 03 14:48:06 ip-172-30-86-52 kernel: INFO: task postgres:91109 blocked
for more than 120 seconds.
May 03 14:48:06 ip-172-30-86-52 kernel: Tainted: P OE
4.15.0-1054-aws #56-Ubuntu
May 03 14:48:06 ip-172-30-86-52 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 03 14:48:06 ip-172-30-86-52 kernel: postgres D 0 91109 91104
0x00000000
May 03 14:48:06 ip-172-30-86-52 kernel: Call Trace:
May 03 14:48:06 ip-172-30-86-52 kernel: __schedule+0x24e/0x880
May 03 14:48:06 ip-172-30-86-52 kernel: schedule+0x2c/0x80
May 03 14:48:06 ip-172-30-86-52 kernel: io_schedule+0x16/0x40
May 03 14:48:06 ip-172-30-86-52 kernel: wbt_wait+0x1e7/0x360
May 03 14:48:06 ip-172-30-86-52 kernel: ?
trace_event_raw_event_wbt_step+0x140/0x140
May 03 14:48:06 ip-172-30-86-52 kernel: blk_mq_make_request+0xe0/0x570
May 03 14:48:06 ip-172-30-86-52 kernel: generic_make_request+0x124/0x300
May 03 14:48:06 ip-172-30-86-52 kernel: submit_bio+0x73/0x140
May 03 14:48:06 ip-172-30-86-52 kernel: ? _cond_resched+0x19/0x40
May 03 14:48:06 ip-172-30-86-52 kernel: ? submit_bio+0x73/0x140
May 03 14:48:06 ip-172-30-86-52 kernel: ? kmem_cache_alloc+0x158/0x1b0
May 03 14:48:06 ip-172-30-86-52 kernel: ? ext4_init_io_end+0x1f/0x40
May 03 14:48:06 ip-172-30-86-52 kernel: ext4_io_submit+0x4c/0x60
May 03 14:48:06 ip-172-30-86-52 kernel: ext4_writepages+0x476/0xf50
May 03 14:48:06 ip-172-30-86-52 kernel: ?
iov_iter_copy_from_user_atomic+0xc2/0x330
May 03 14:48:06 ip-172-30-86-52 kernel: ?
__generic_file_write_iter+0x1aa/0x1d0
May 03 14:48:06 ip-172-30-86-52 kernel: do_writepages+0x4b/0xe0
May 03 14:48:06 ip-172-30-86-52 kernel: ?
ext4_mark_inode_dirty+0x1d0/0x1d0
May 03 14:48:06 ip-172-30-86-52 kernel: ? do_writepages+0x4b/0xe0
May 03 14:48:06 ip-172-30-86-52 kernel: ? new_sync_write+0xe7/0x140
May 03 14:48:06 ip-172-30-86-52 kernel:
__filemap_fdatawrite_range+0xc1/0x100
May 03 14:48:06 ip-172-30-86-52 kernel: ?
__filemap_fdatawrite_range+0xc1/0x100
May 03 14:48:06 ip-172-30-86-52 kernel: SyS_sync_file_range+0xef/0x130
May 03 14:48:06 ip-172-30-86-52 kernel: do_syscall_64+0x73/0x130
May 03 14:48:06 ip-172-30-86-52 kernel:
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
May 03 14:48:06 ip-172-30-86-52 kernel: RIP: 0033:0x7f596fab97a7
May 03 14:48:06 ip-172-30-86-52 kernel: RSP: 002b:00007fff07b842a8 EFLAGS:
00000246 ORIG_RAX: 0000000000000115
May 03 14:48:06 ip-172-30-86-52 kernel: RAX: ffffffffffffffda RBX:
000000000000015e RCX: 00007f596fab97a7
May 03 14:48:06 ip-172-30-86-52 kernel: RDX: 0000000000002000 RSI:
00000000122ba000 RDI: 0000000000000055
May 03 14:48:06 ip-172-30-86-52 kernel: RBP: 00007fff07b842c0 R08:
0000000000000002 R09: 0000000000000200
May 03 14:48:06 ip-172-30-86-52 kernel: R10: 0000000000000002 R11:
0000000000000246 R12: 0000000000002000
May 03 14:48:06 ip-172-30-86-52 kernel: R13: 00000000122ba000 R14:
000000000a00000a R15: 0000000000000001

Any thoughts what went wrong, or where I should look for more data?

Browse pgsql-bugs by date

  From Date Subject
Next Message Suhonen Reijo (Valtori) 2020-05-04 07:26:19 VS: BUG #16404: PostgreSQL 12.2: Database restore does not accept the EEST time zone.
Previous Message PG Bug reporting form 2020-05-04 01:59:18 BUG #16413: Broken Dependency pgadmin4-web to pgadmin4-python3-psutil