Quick Links

Re: long-standing data loss bug in initial sync of logical replication

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: long-standing data loss bug in initial sync of logical replication
Date:	2023-11-19 02:18:30
Message-ID:	20231119021830.d6t6aaxtrkpn743y@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2023-11-19 02:15:33 +0100, Tomas Vondra wrote:
>
>
> On 11/18/23 22:05, Andres Freund wrote:
> > Hi,
> >
> > On 2023-11-18 21:45:35 +0100, Tomas Vondra wrote:
> >> On 11/18/23 19:12, Andres Freund wrote:
> >>>> If we increase the locks from ShareUpdateExclusive to ShareRowExclusive,
> >>>> we're making it conflict with RowExclusive. Which is just DML, and I
> >>>> think we need to do that.
> >>>
> >>> From what I can tell it needs to to be an AccessExlusiveLock. Completely
> >>> independent of logical decoding. The way the cache stays coherent is catalog
> >>> modifications conflicting with anything that builds cache entries. We have a
> >>> few cases where we do use lower level locks, but for those we have explicit
> >>> analysis for why that's ok (see e.g. reloptions.c) or we block until nobody
> >>> could have an old view of the catalog (various CONCURRENTLY) operations.
> >>>
> >>
> >> Yeah, I got too focused on the issue I triggered, which seems to be
> >> fixed by using SRE (still don't understand why ...). But you're probably
> >> right there may be other cases where SRE would not be sufficient, I
> >> certainly can't prove it'd be safe.
> >
> > I think it makes sense here: SRE prevents the problematic "scheduling" in your
> > test - with SRE no DML started before ALTER PUB ... ADD can commit after.
> >
>
> If understand correctly, with the current code (which only gets
> ShareUpdateExclusiveLock), we may end up in a situation like this
> (sessions A and B):
>
> A: starts "ALTER PUBLICATION p ADD TABLE t" and gets the SUE lock
> A: writes the invalidation message(s) into WAL
> B: inserts into table "t"
> B: commit
> A: commit

I don't think this the problematic sequence - at least it's not what I had
reproed in
https://postgr.es/m/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de

Adding line numbers:

1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S1: ALTER PUBLICATION pb ADD TABLE d;
5) S2: COMMIT
6) S2: INSERT INTO d VALUES('d3');
7) S1: INSERT INTO d VALUES('d4');
8) RL: <nothing>

The problem with the sequence is that the insert from 3) is decoded *after* 4)
and that to decode the insert (which happened before the ALTER) the catalog
snapshot and cache state is from *before* the ALTER TABLE. Because the
transaction started in 3) doesn't actually modify any catalogs, no
invalidations are executed after decoding it. The result is that the cache
looks like it did at 3), not like after 4). Undesirable timetravel...

It's worth noting that here the cache state is briefly correct, after 4), it's
just that after 5) it stays the old state.

If 4) instead uses a SRE lock, then S1 will be blocked until S2 commits, and
everything is fine.

> > I'm not sure there are any cases where using SRE instead of AE would cause
> > problems for logical decoding, but it seems very hard to prove. I'd be very
> > surprised if just using SRE would not lead to corrupted cache contents in some
> > situations. The cases where a lower lock level is ok are ones where we just
> > don't care that the cache is coherent in that moment.

> Are you saying it might break cases that are not corrupted now? How
> could obtaining a stronger lock have such effect?

No, I mean that I don't know if using SRE instead of AE would have negative
consequences for logical decoding. I.e. whether, from a logical decoding POV,
it'd suffice to increase the lock level to just SRE instead of AE.

Since I don't see how it'd be correct otherwise, it's kind of a moot question.

> > In a way, the logical decoding cache-invalidation situation is a lot more
> > atomic than the "normal" situation. During normal operation locking is
> > strictly required to prevent incoherent states when building a cache entry
> > after a transaction committed, but before the sinval entries have been
> > queued. But in the logical decoding case that window doesn't exist.
> >
> Because we apply the invalidations at commit time, so it happens as a
> single operation that can't interleave with other sessions?

Yea, the situation is much simpler during logical decoding than "originally" -
there's no concurrency.

Greetings,

Andres Freund

In response to

Re: long-standing data loss bug in initial sync of logical replication at 2023-11-19 01:15:33 from Tomas Vondra

Responses

Re: long-standing data loss bug in initial sync of logical replication at 2024-01-16 10:24:02 from Vadim Lakt
Re: long-standing data loss bug in initial sync of logical replication at 2024-06-24 10:54:38 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	vignesh C	2023-11-19 03:38:39	Re: pg_upgrade and logical replication
Previous Message	vignesh C	2023-11-19 01:26:05	Re: pg_upgrade and logical replication