BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.

From: maxim(dot)boguk(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.
Date: 2015-10-01 12:17:45
Message-ID: 20151001121745.368.58359@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13657
Logged by: Maksym Boguk
Email address: maxim(dot)boguk(at)gmail(dot)com
PostgreSQL version: 9.4.4
Operating system: Linux
Description:

Hi,

On one of 9.4.4 replica database I had found strange issue with stuck query
and stuck wal replay process waiting each other but with no detectable
deadlock condtions.

Stuck query runs:
SELECT
schemaname,
relname,
pg_relation_size(relid) as table_size,
pg_indexes_size(relid) as table_indexes_size,
seq_scan,
coalesce(idx_scan,0) as idx_scan,
seq_tup_read,
coalesce(idx_tup_fetch,0) as idx_tup_fetch,
n_tup_ins as inserted,
n_tup_upd as updated,
n_tup_del as deleted,
n_tup_hot_upd as hot_updated
FROM
pg_stat_user_tables

In pg_stat_activity it's on waiting state.
Locks for this query:
postgres=# select * from pg_locks where pid=862 order by granted, relation;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+-----------------+---------+----------
relation | 16424 | 17987 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
f | f
relation | 16424 | 1259 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2610 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2615 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2662 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2663 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2678 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2679 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2684 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 2685 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 3455 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 11221 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
relation | 16424 | 11236 | | | |
| | | | 7/924904 | 862 | AccessShareLock |
t | t
virtualxid | | | | | 7/924904 |
| | | | 7/924904 | 862 | ExclusiveLock |
t | t

So not granted an single lock on relation 17987 (just common table inside
the database).

Now lets see whats going on with wal replay:
ps -axuww | grep recover
postgres 26786 2.5 0.0 177620 76784 ? Ss Sep30 48:27 postgres:
startup process recovering 00000001000061A0000000AE waiting
So wal replay waiting for something.

Lets see what locks wal replay have:
postgres=# select * from pg_locks where pid=26786 order by granted,
relation;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
relation | 16424 | 17987 | | | |
| | | | 1/0 | 26786 |
AccessExclusiveLock | t | f
virtualxid | | | | | 1/1 |
| | | | 1/0 | 26786 | ExclusiveLock
| t | t
(2 rows)

So wal replay got AccessExclusiveLock on relation 17987 and waiting for
something.
And query waiting for AccessShareLock on the same relation.

Once I killed waiting query, wal replay resume with no issues.
So seems there are some king of undetected deadlock condition between query
and wal replay.

PS: situation reproduced every few hours so it's possible to debug but I
have no good idea where to start.

Kind Regards,
Maksym

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message jesper 2015-10-01 12:17:48 BUG #13658: DELETE with syntax error in subselect deletes ALL
Previous Message swingi 2015-10-01 10:17:33 BUG #13656: table inheritance, pg_dump emits same constraint for all inheritors causing errors