From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> |
Cc: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> |
Subject: | Re: [HACKERS] Issues with logical replication |
Date: | 2017-11-29 23:40:36 |
Message-ID: | 20171129234036.rddarwqoqly2a2ni@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote:
> Yes that helps thanks. Now that I reproduced it I understand, I was
> confused by the backtrace that said xid was 0 on the input to
> XactLockTableWait() but that's not the case, it's what xid is changed to
> in the inner loop.
> So what happens is that we manage to do LogStandbySnapshot(), decode the
> logged snapshot, and run SnapBuildWaitSnapshot() for a transaction in
> between GetNewTransactionId() and XactLockTableInsert() calls in
> AssignTransactionId() for that same transaction.
>
> I guess the probability of this happening is increased by the fact that
> GetRunningTransactionData() acquires XidGenLock so if there is
> GetNewTransactionId() running in parallel it will wait for it to finish
> and we'll log immediately after that.
>
> Hmm that means that Robert's loop idea will not help and ProcArrayLock
> will not save us either. Maybe we could either rewrite XactLockTableWait
> or create another version of it with SubTransGetParent() call replaced
> by SubTransGetTopmostTransaction() as that will return the same top
> level xid in case the input xid wasn't a subxact. That would make it
> safe to be called on transactions that didn't acquire lock on themselves
> yet.
I've not really looked into this deeply, but afair we can just make this
code accept that edgecase be done with it. As the comment says:
* Iterate through xids in record, wait for all older than the cutoff to
* finish. Then, if possible, log a new xl_running_xacts record.
*
--- highlight ---
* This isn't required for the correctness of decoding, but to:
--- highlight ---
* a) allow isolationtester to notice that we're currently waiting for
* something.
* b) log a new xl_running_xacts record where it'd be helpful, without having
* to write for bgwriter or checkpointer.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Petr Jelinek | 2017-11-29 23:45:44 | Re: [HACKERS] Issues with logical replication |
Previous Message | Andres Freund | 2017-11-29 23:34:24 | Re: Would a BGW need shmem_access or database_connection to enumerate databases? |