Quick Links

Re: long-standing data loss bug in initial sync of logical replication

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Nitin Motiani <nitinmotiani(at)google(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: long-standing data loss bug in initial sync of logical replication
Date:	2025-02-24 09:49:49
Message-ID:	CAA4eK1KDPryBr-ewsW_Rkh5x0oiqpcOHAF3GAocDUrK2CV+atA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I confirmed that the proposed patch fixes these issues. I have one
> question about the patch:
>
> In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> following code:
>
> /*
> * If we don't have a base snapshot yet, there are no changes in this
> * transaction which in turn implies we don't yet need a snapshot at
> * all. We'll add a snapshot when the first change gets queued.
> *
> * NB: This works correctly even for subtransactions because
> * ReorderBufferAssignChild() takes care to transfer the base snapshot
> * to the top-level transaction, and while iterating the changequeue
> * we'll get the change from the subtxn.
> */
> if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> continue;
>
> Is there any case where we need to distribute inval messages to
> transactions that don't have the base snapshot yet but eventually need
> the inval messages?
>

Good point. It is mentioned that for snapshots: "We'll add a snapshot
when the first change gets queued.". I think we achieve this via
builder->committed.xip array such that when we set a base snapshot for
a transaction, we use that array to form a snapshot. However, I don't
see any such consideration for invalidations. Now, we could either
always add invalidations to xacts that don't have base_snapshot yet or
have a mechanism similar committed.xid array. But it is better to
first reproduce the problem.

> Overall, with this idea, we distribute invalidation messages to all
> concurrent decoded transactions. It could introduce performance
> regressions by several causes. For example, we could end up
> invalidating RelationSyncCache entries in more cases. While this is
> addressed by your selectively cache invalidation patch, there is still
> 5% regression. We might need to accept a certain amount of regressions
> for making it correct but it would be better to figure out where these
> regressions come from. Other than that, I think the performance
> regression could happen due to the costs of distributing invalidation
> messages. You've already observed there is 1~3% performance regression
> in cases where we distribute a large amount of invalidation messages
> to one concurrently decoded transaction[1]. I guess that the
> selectively cache invalidation idea would not help this case. Also, I
> think we might want to test other cases like where we distribute a
> small amount of invalidation messages to many concurrently decoded
> transactions.
>

+1.

--
With Regards,
Amit Kapila.

In response to

Re: long-standing data loss bug in initial sync of logical replication at 2024-12-11 07:06:26 from Masahiko Sawada

Responses

RE: long-standing data loss bug in initial sync of logical replication at 2025-02-27 08:14:01 from Zhijie Hou (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jelte Fennema-Nio	2025-02-24 09:56:23	Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
Previous Message	Andres Freund	2025-02-24 09:41:36	Re: Remove wal_[sync\|write][_time] from pg_stat_wal and track_wal_io_timing