Issue with COMMITs stuck on "AccessExclusiveLock on object 0 of class 1262 of database 0"

From: Martijn van Oosterhout <kleptog(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Issue with COMMITs stuck on "AccessExclusiveLock on object 0 of class 1262 of database 0"
Date: 2019-03-29 17:11:29
Message-ID: CADWG95t0j9zF0uwdcMH81KMnDsiTAVHxmBvgYqrRJcD-iLwQhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hoi,

We're running 9.4.20 (yes it's old) and having issues with the database
getting stuck occasionally, with COMMITs in one database being blocked on
COMMITs in a completely different database. I've been able to determine
that "object 0 of class 1262 of database 0" relates to the LISTEN/NOTIFY
machinery. See message from Tom Lane[1]. There are not many references to
this error online, mostly posted questions with no answers.

We have lock_delay set to 1 second and it catches mostly COMMITs. The logs
are at the bottom of the email: What is most interesting is that the
COMMITs in one database are blocked on COMMITs of a different database.
This is quite unexpected for us, because we would expect the databases to
work independently of each other. The above AccessExclusive lock appears to
be a global lock shared across all databases.

A bit more context:
- The system has about 100 databases, and several hundred connections all
up (yes, we have a connection poolers)
- We do about 8,000 queries per second
- We use LISTEN/NOTIFY quite a bit, probably 40-100 notifies per second
(haven't got good stats on this yet).
- The wait queues in the logs below consist of many other processes all
attempting to COMMIT across many different databases. Since we only log
queries blocked more than a second they don't appear in this log. Mostly
after about 1.5 seconds it resolves itself. However, this is enough to
block hundreds of transactions across the cluster. We've seen wait queues
of 500 on this single lock.
- The problem gets really bad when a session was holding another lock and
then blocks here. This means other sessions get stuck as well, even if they
do nothing with NOTIFY. Turns out this doesn't happen so often fortunately,
but when it does the system does grind to a halt for a few seconds.
- There are replication slaves, but they're all on async
- The I/O subsystem isn't stressed and we don't see any memory pressure or
high CPU usage.
- There are no warnings about "too many notifications"

We're wondering if it's possible for a single process to send a lot of
notifications and thus spend a lot of time doing that? It there anyway of
measuring this?

Can anyone shed light on what is going on here?

[1]:
https://www.postgresql.org/message-id/flat/3598.1363354686%40sss.pgh.pa.us

=== logs ===
2019-03-27 01:40:42 UTC [33226-1] [th.ha] app_ha(at)app_ha LOG: process 33226
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.050 ms
2019-03-27 01:40:42 UTC [33226-2] [th.ha] app_ha(at)app_ha DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012.
2019-03-27 01:40:42 UTC [33226-3] [th.ha] app_ha(at)app_ha STATEMENT: COMMIT
2019-03-27 01:40:42 UTC [32565-1] [th.ek] app_ek(at)app_ek LOG: process 32565
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.102 ms
2019-03-27 01:40:42 UTC [32565-2] [th.ek] app_ek(at)app_ek DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012, 34005, 32054, 26820, 32788.
2019-03-27 01:40:42 UTC [32565-3] [th.ek] app_ek(at)app_ek STATEMENT: COMMIT
2019-03-27 01:40:42 UTC [32443-1] [th.ne] app_ne(at)app_ne LOG: process 32443
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.062 ms
2019-03-27 01:40:42 UTC [32443-2] [th.ne] app_ne(at)app_ne DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012, 34005, 32054, 26820,
32788, 33587, 34062.
2019-03-27 01:40:42 UTC [32443-3] [th.ne] app_ne(at)app_ne STATEMENT: COMMIT
2019-03-27 01:40:42 UTC [33784-1] [ev.ge] app_ge(at)app_ge LOG: process 33784
still waiting for ExclusiveLock on advisory lock
[7777729,1129598309,56397,2] after 1000.085 ms
2019-03-27 01:40:42 UTC [33784-2] [ev.ge] app_ge(at)app_ge DETAIL: Process
holding the lock: 32452. Wait queue: 33784.
2019-03-27 01:40:42 UTC [33784-3] [ev.ge] app_ge(at)app_ge STATEMENT: SELECT
pg_advisory_xact_lock(1129598309, 56397)
2019-03-27 01:40:42 UTC [32452-2] [ev.ge] app_ge(at)app_ge LOG: process 32452
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.068 ms
2019-03-27 01:40:42 UTC [32452-3] [ev.ge] app_ge(at)app_ge DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012, 34005, 32054, 26820,
32788, 33587, 34062, 32282, 34127.
2019-03-27 01:40:42 UTC [32452-4] [ev.ge] app_ge(at)app_ge STATEMENT: COMMIT
2019-03-27 01:40:43 UTC [33923-1] [th.ei] app_ei(at)app_ei LOG: process 33923
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.131 ms
2019-03-27 01:40:43 UTC [33923-2] [th.ei] app_ei(at)app_ei DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012, 34005, 32054, 26820,
32788, 33587, 34062, 32282, 34127, 34129.
2019-03-27 01:40:43 UTC [33923-3] [th.ei] app_ei(at)app_ei STATEMENT: COMMIT
2019-03-27 01:40:43 UTC [33900-1] [th.t6] app_t6(at)app_t6 LOG: process 33900
still waiting for AccessExclusiveLock on object 0 of class 1262 of database
0 after 1000.080 ms
2019-03-27 01:40:43 UTC [33900-2] [th.t6] app_t6(at)app_t6 DETAIL: Process
holding the lock: 33162. Wait queue: 33226, 32565, 32443, 32452, 33923,
33900, 31701, 33827, 33844, 33427, 33359, 34012, 34005, 32054, 26820,
32788, 33587, 34062, 32282, 34127, 34129, 34080, 33161.
2019-03-27 01:40:43 UTC [33900-3] [th.t6] app_t6(at)app_t6 STATEMENT: COMMIT
--
Martijn van Oosterhout <kleptog(at)gmail(dot)com> http://svana.org/kleptog/

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-03-29 19:23:59 Re: Key encryption and relational integrity
Previous Message Peter J. Holzer 2019-03-29 16:55:31 Re: Key encryption and relational integrity