GetLockConflicts() and thus recovery conflicts seem pretty broken

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: GetLockConflicts() and thus recovery conflicts seem pretty broken
Date: 2015-01-27 14:27:13
Message-ID: 20150127142713.GD29457@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While investigating other bugs I noticed that
ResolveRecoveryConflictWithLock() wasn't really working. Turns out
GetLockConflicts() violates it's API contract which says:

* The result array is palloc'd and is terminated with an invalid VXID.

Problem 1:
We don't actually put the terminator there. It happens to more or less
accidentally work on a master because the array is palloc0()ed there and
while a 0 is valid backend id it happens to not be a valid local
transaction id. In HS we don't actually allocate the array every time,
but it's instead statically allocated. Without zeroing.

I have no idea why this doesn't crash
ResolveRecoveryConflictWithInterrupt() which does:
while (!lock_acquired)
{
if (++num_attempts < 3)
backends = GetLockConflicts(locktag, AccessExclusiveLock);
...
ResolveRecoveryConflictWithVirtualXIDs(backends,
PROCSIG_RECOVERY_CONFLICT_LOCK);
and ResolveRecoveryConflictWithVirtualXIDs does:
...
while (VirtualTransactionIdIsValid(*waitlist))
{
/* kill kill kill */

/* The virtual transaction is gone now, wait for the next one */
waitlist++;
}

I guess we just accidentally happen to come across appropriately
set memory at some point.

I'm really baffled that this hasn't caused significant problems so far.

Problem 2:
Since bcd8528f001 and 29eedd312274 the "the result array is palloc'd" is
wrong because we're now doing:

static VirtualTransactionId *vxids;
/*
* Allocate memory to store results, and fill with InvalidVXID. We only
* need enough space for MaxBackends + a terminator, since prepared xacts
* don't count. InHotStandby allocate once in TopMemoryContext.
*/
if (InHotStandby)
{
if (vxids == NULL)
vxids = (VirtualTransactionId *)
MemoryContextAlloc(TopMemoryContext,
sizeof(VirtualTransactionId) * (MaxBackends + 1));
}
else
vxids = (VirtualTransactionId *)
palloc0(sizeof(VirtualTransactionId) * (MaxBackends + 1));

Obviously that violates the API contract. I'm inclined to rip the HS
special case out and add a pfree() to the single HS caller. The commit
message introducing it says:

Use malloc() in GetLockConflicts() when called InHotStandby to avoid
repeated palloc calls. Current code assumed this was already true, so
this is a bug fix.
But I can't really believe this is relevant.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-01-27 14:36:58 Re: pg_upgrade and rsync
Previous Message Robert Haas 2015-01-27 12:40:35 Re: Windows buildfarm animals are still not happy with abbreviated keys patch