Re: [HACKERS] Issues with logical replication

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Issues with logical replication
Date: 2017-11-16 01:20:39
Message-ID: 59145A4F-CC42-47D8-B451-D4C4285BB3F0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 15 Nov 2017, at 23:09, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Ouch. This seems like a bug that needs to be fixed, but do you think
> it's related to to Petr's proposed fix to set es_output_cid? That fix
> looks reasonable, since we shouldn't try to lock tuples without a
> valid CommandId.
>
> Now, having said that, I understand how the lack of that fix could cause:
>
> 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple
>
> But I do not understand how it could cause:
>
> #3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
>

I think that’s two separate bugs. Second one is due to race between
XactLockTableWait() and acquiring of heavyweight transaction lock in
AssignTransactionId(). XactLockTableWait assumes that transaction
already set it’s hw transaction lock, but with current machinery of
RunningXacts and snapshot building we can start such waiting before
transaction gets it hw lock.

I can see two ways out of this and both of them are quite unattractive:

1. Create a flag in pgproc indicating that hw lock is acquired and
include in RunningXacts only transaction with this flag set.
That's probably quite expensive since we need to hold one more exclusive
lwlock in AssignTransactionId.

2. Rewrite XactLockTableWait() in a way that it is aware of the fact that
valid xid can be topmost transaction but not yet have a hw lock entry.
In such situation we can just wait and try to lock again. Good stop
condition in this loop is TransactionIdIsInProgress(xid) becoming
false.

I did a sketch of first approach just to confirm that it solves the problem.
But there I hold ProcArrayLock during update of flag. Since only reader is
GetRunningTransactionData it possible to have a custom lock there. In
this case GetRunningTransactionData will hold three locks simultaneously,
since it already holds ProcArrayLock and XidGenLock =)

Any better ideas?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
xltw_fix.diff application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-11-16 01:26:35 Re: pgsql: Add hooks for session start and session end
Previous Message David Rowley 2017-11-16 00:32:40 Re: [HACKERS] Partition-wise aggregation/grouping