Re: Assertion failure in SnapBuildInitialSnapshot()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Assertion failure in SnapBuildInitialSnapshot()
Date: 2023-01-30 11:27:19
Message-ID: CAA4eK1KDFeh=ZbvSWPx=ir2QOXBxJbH0K8YqifDtG3xJENLR+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2023 at 11:34 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I have reproduced it manually. For this, I had to manually make the
> debugger call ReplicationSlotsComputeRequiredXmin(false) via path
> SnapBuildProcessRunningXacts()->LogicalIncreaseXminForSlot()->LogicalConfirmReceivedLocation()
> ->ReplicationSlotsComputeRequiredXmin(false) for the apply worker. The
> sequence of events is something like (a) the replication_slot_xmin for
> tablesync worker is overridden by apply worker as zero as explained in
> Sawada-San's email, (b) another transaction happened on the publisher
> that will increase the value of ShmemVariableCache->nextXid (c)
> tablesync worker invokes
> SnapBuildInitialSnapshot()->GetOldestSafeDecodingTransactionId() which
> will return an oldestSafeXid which is higher than snapshot's xmin.
> This happens because replication_slot_xmin has an InvalidTransactionId
> value and we won't consider replication_slot_catalog_xmin because
> catalogOnly flag is false and there is no other open running
> transaction. I think we should try to get a simplified test to
> reproduce this problem if possible.
>

Here are steps to reproduce it manually with the help of a debugger:

Session-1
==========
select pg_create_logical_replication_slot('s', 'test_decoding');
create table t2(c1 int);
select pg_replication_slot_advance('s', pg_current_wal_lsn()); --
Debug this statement. Stop before taking procarraylock in
ProcArraySetReplicationSlotXmin.

Session-2
============
psql -d postgres
Begin;

Session-3
===========
psql -d "dbname=postgres replication=database"

begin transaction isolation level repeatable read read only;
CREATE_REPLICATION_SLOT slot1 LOGICAL test_decoding USE_SNAPSHOT;
--Debug this statement. Stop in SnapBuildInitialSnapshot() before
taking procarraylock

Session-1
==========
Continue debugging and finish execution of
ProcArraySetReplicationSlotXmin. Verify
procArray->replication_slot_xmin is zero.

Session-2
=========
Select txid_current();
Commit;

Session-3
==========
Continue debugging.
Verify that safeXid follows snap->xmin. This leads to assertion (in
back branches) or error (in HEAD).

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-01-30 11:29:48 Re: Assertion failure in SnapBuildInitialSnapshot()
Previous Message Amit Kapila 2023-01-30 11:24:46 Re: Assertion failure in SnapBuildInitialSnapshot()