BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: alexey(dot)ermakov(at)dataegret(dot)com
Subject: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table
Date: 2019-09-20 09:40:31
Message-ID: 16016-ed0345229ddc484c@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16016
Logged by: Alexey Ermakov
Email address: alexey(dot)ermakov(at)dataegret(dot)com
PostgreSQL version: 11.5
Operating system: Ubuntu 16.04.6 LTS
Description:

Hello,

recently I encountered strange situation when replica server stopped serving
queries. most likely it happened because startup process was holding access
exclusive lock on pg_statistic's toast table and was waiting for some
backend which lead to deadlock.
situation resolved after max_standby_streaming_delay timeout kicked in and
conflicting backend's queries were canceled.

more details:

1) first waiting backend in logs:
2019-09-19 12:14:11.314 UTC 144285 user(at)db from [local] [vxid:21/106350189
txid:0] [BIND waiting] LOG: process 144285 still waiting for
AccessShareLock on relation 2840 of database 16389 after 1000.115 ms
2019-09-19 12:14:11.314 UTC 144285 user(at)db from [local] [vxid:21/106350189
txid:0] [BIND waiting] DETAIL: Process holding the lock: 6002. Wait queue:
144285, 136649, 127349, 76417, 97350, 93019, 109010, 103163, 136300.

2) 6002 pid is indeed startup process:
ps ax | grep 6002
6002 ? Ss 200:16 postgres: 11/main: startup recovering
000000010000425900000094

3) locked relation is pg_statistic's toast table:

select 2840::regclass;
regclass
------------------------
pg_toast.pg_toast_2619

select 2619::regclass;
regclass
--------------
pg_statistic

4) I guess autovacuum was working on pg_statistic table on master server
right before problem occurred and cleaned up some pages at the end of
relation.
but can't confirm, log_autovacuum_min_duration = 1s and no entries in logs,
but table is pretty small (~2MB) so autovacuum might be finished in less
than 1s.

5) according to client no DB migrations were performed at that time.

same thing (again toast table of pg_statistic) also happened year ago twice
on 9.6.6 version on completely different server and project.
unfortunately I wasn't able to get gdb backtrace of startup process, so it
might be difficult to understand where it got stuck :(

at year's ago case I was able to get output of pg_locks, but not much
interesting there:

select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted | fastpath
------------+----------+----------+------+-------+-------------+---------------+---------+-------+----------+--------------------+--------+---------------------+---------+----------
...truncated, I can show full output if it might help...
virtualxid | | | | | 12/19089055 |
| | | | 12/19089055 | 91145 | ExclusiveLock
| t | t
virtualxid | | | | | 1/1 |
| | | | 1/0 | 16141 | ExclusiveLock
| t | t
relation | 20508 | 2840 | | | |
| | | | 12/19089055 | 91145 |
AccessShareLock | f | f
relation | 20508 | 2840 | | | |
| | | | 24/18427740 | 94698 |
AccessShareLock | f | f
relation | 20508 | 2840 | | | |
| | | | 26/15964159 | 96387 |
AccessShareLock | f | f
relation | 20508 | 2840 | | | |
| | | | 27/11337343 | 107448 |
AccessShareLock | f | f
relation | 20508 | 2840 | | | |
| | | | 28/9666022 | 99110 |
AccessShareLock | f | f
relation | 20508 | 2840 | | | |
| | | | 1/0 | 16141 |
AccessExclusiveLock | t | f
(56 rows)

in that case pid 16141 - startup process and again 2840 - oid of toast table
of pg_statistic.
16141 ? Ss 14169:14 postgres: 9.6/main: startup process
recovering 000000020000A3E3000000DC waiting

--
Thanks,
Alexey Ermakov

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ashutosh Sharma 2019-09-20 11:12:47 Re: BUG #16015: information_schema.triggers lack of truncate trigger
Previous Message PG Bug reporting form 2019-09-20 09:18:30 BUG #16015: information_schema.triggers lack of truncate trigger