Re: Excessive CPU usage in StandbyReleaseLocks()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Excessive CPU usage in StandbyReleaseLocks()
Date: 2018-06-19 06:30:28
Message-ID: 20180619063027.x2wjgfvqa5rkjnjy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-06-19 17:43:42 +1200, Thomas Munro wrote:
> The problem is that StandbyReleaseLocks() does a linear search of all
> known AccessExclusiveLocks when a transaction ends. Luckily, since
> v10 (commit 9b013dc2) that is skipped for transactions that haven't
> taken any AELs and aren't using 2PC, but that doesn't help all users.

> It's fine if the AEL list is short, but if you do something that takes
> a lot of AELs such as restoring a database with many tables or
> truncating a lot of partitions while other transactions are in flight
> then we start doing O(txrate * nlocks * nsubxacts) work and that can
> hurt.
>
> The reproducer script I've attached creates one long-lived transaction
> that acquires 6,000 AELs and takes a nap, while 48 connections run
> trivial 2PC transactions (I was also able to reproduce the effect
> without 2PC by creating a throw-away temporary table in every
> transaction, but it was unreliable due to contention slowing
> everything down). For me, the standby's startup process becomes 100%
> pegged, replay_lag begins to climb and perf says something like:
>
> + 97.88% 96.96% postgres postgres [.] StandbyReleaseLocks
>
> The attached patch splits the AEL list into one list per xid and
> sticks them in a hash table. That makes perf say something like:

> + 0.60% 0.00% postgres postgres [.] StandbyReleaseLocks

Neato. Results aside, that seems like a better suited datastructure.

> /*
> * InitRecoveryTransactionEnvironment
> @@ -63,6 +73,19 @@ void
> InitRecoveryTransactionEnvironment(void)
> {
> VirtualTransactionId vxid;
> + HASHCTL hash_ctl;
> +
> + /*
> + * Initialize the hash table for tracking the list of locks held by each
> + * transaction.
> + */
> + memset(&hash_ctl, 0, sizeof(hash_ctl));
> + hash_ctl.keysize = sizeof(TransactionId);
> + hash_ctl.entrysize = sizeof(RecoveryLockListsEntry);
> + RecoveryLockLists = hash_create("RecoveryLockLists",
> + 4096,
> + &hash_ctl,

Isn't that initial size needlessly big?

> void
> StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
> {
> + RecoveryLockListsEntry *entry;
> xl_standby_lock *newlock;
> LOCKTAG locktag;
> + bool found;
>
> /* Already processed? */
> if (!TransactionIdIsValid(xid) ||
> @@ -616,11 +641,19 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
> /* dbOid is InvalidOid when we are locking a shared relation. */
> Assert(OidIsValid(relOid));
>
> + /* Create a new list for this xid, if we don't have one already. */
> + entry = hash_search(RecoveryLockLists, &xid, HASH_ENTER, &found);
> + if (!found)
> + {
> + entry->xid = xid;
> + entry->locks = NIL;
> + }
> +
> newlock = palloc(sizeof(xl_standby_lock));
> newlock->xid = xid;
> newlock->dbOid = dbOid;
> newlock->relOid = relOid;
> - RecoveryLockList = lappend(RecoveryLockList, newlock);
> + entry->locks = lappend(entry->locks, newlock);

Gotta be careful with lappend et al - it's really easy to leak memory
without explicit context usage. Have you checked that we don't here?

> SET_LOCKTAG_RELATION(locktag, newlock->dbOid, newlock->relOid);
>
> @@ -628,46 +661,45 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
> }
>
> static void
> -StandbyReleaseLocks(TransactionId xid)
> +StandbyReleaseLockList(List *locks)
> {
> - ListCell *cell,
> - *prev,
> - *next;
> -
> - /*
> - * Release all matching locks and remove them from list
> - */
> - prev = NULL;
> - for (cell = list_head(RecoveryLockList); cell; cell = next)
> + while (locks)
> {
> - xl_standby_lock *lock = (xl_standby_lock *) lfirst(cell);
> + xl_standby_lock *lock = (xl_standby_lock *) linitial(locks);
> + LOCKTAG locktag;
> + elog(trace_recovery(DEBUG4),
> + "releasing recovery lock: xid %u db %u rel %u",
> + lock->xid, lock->dbOid, lock->relOid);
> + SET_LOCKTAG_RELATION(locktag, lock->dbOid, lock->relOid);
> + if (!LockRelease(&locktag, AccessExclusiveLock, true))
> + elog(LOG,
> + "RecoveryLockLists contains entry for lock no longer recorded by lock manager: xid %u database %u relation %u",
> + lock->xid, lock->dbOid, lock->relOid);

This should be a PANIC imo.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-06-19 07:09:56 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Michael Paquier 2018-06-19 05:47:00 Re: Partitioning with temp tables is broken