Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2022-11-11 22:49:07
Message-ID: 8bf1c518-b886-fe1b-5c42-09f9c663146d@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I noticed on cfbot the patch no longer applies, so here's a rebased
version. Most of the breakage was due to the column filtering reworks,
grammar changes etc. A lot of bitrot, but mostly mechanical stuff.

I haven't looked into the optimizations / improvements I discussed in my
previous post (logging only LSN of the last WAL-logged increment),
because while fixing "make check-world" I ran into a more serious issue
that I think needs to be discussed first. And I suspect it might also
affect the feasibility of the LSN optimization.

So, what's the issue - the current solution is based on WAL-logging
state of all sequences incremented by the transaction at COMMIT. To do
that, we read the state from disk, and write that into WAL. However,
these WAL messages are not necessarily correlated to COMMIT records, so
stuff like this might happen:

1. transaction T1 increments sequence S
2. transaction T2 increments sequence S
3. both T1 and T2 start to COMMIT
4. T1 reads state of S from disk, writes it into WAL
5. transaction T3 increments sequence S
6. T2 reads state of S from disk, writes it into WAL
7. T2 write COMMIT into WAL
8. T1 write COMMIT into WAL

Because the apply order is determined by ordering of COMMIT records,
this means we'd apply the increments logged by T2, and then by T1. But
that undoes the increment by T3, and the sequence would go backwards.

The previous patch version addressed that by acquiring lock on the
sequence, holding it until transaction end. This effectively ensures the
order of sequence messages and COMMIT matches. But that's problematic
for a number of reasons:

1) throughput reduction, because the COMMIT records need to serialize

2) deadlock risk, if we happen to lock sequences in different order
(in different transactions)

3) problem for prepared transactions - the sequences are locked and
logged in PrepareTransaction, because we may not have seqhashtab
beyond that point. This is a much worse variant of (1).

Note: I also wonder what happens if someone does DISCARD SEQUENCES. I
guess we'll forget the sequences, which is bad - so we'd have to invent
a separate cache that does not have this issue.

I realized (3) because one of the test_decoding TAP tests got stuck
exactly because of a sequence locked by a prepared transaction.

This patch simply releases the lock after writing the WAL message, but
that just makes it vulnerable to the reordering. And this would have
been true even with the LSN optimization.

However, I was thinking that maybe we could use the LSN of the WAL
message (XLOG_LOGICAL_SEQUENCE) to deal with the ordering issue, because
*this* is the sensible sequence increment ordering.

In the example above, we'd first apply the WAL message from T2 (because
that commits first). And then we'd get to apply T1, but the WAL message
has an older LSN, so we'd skip it.

But this requires us remembering LSN of the already applied WAL sequence
messages, which could be tricky - we'd need to persist it in some way
because of restarts, etc. We can't do this while decoding but on the
apply side, I think, because of streaming, aborts.

The other option might be to make these messages non-transactional, in
which case we'd separate the ordering from COMMIT ordering, evading the
reordering problem.

That'd mean we'd ignore rollbacks (which seems fine), we could probably
optimize this by checking if the state actually changed, etc. But we'd
also need to deal with transactions created in the (still uncommitted)
transaction. But I'm also worried it might lead to the same issue with
non-transactional behaviors that forced revert in v15.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
decoding-sequences-tracking-20221112.patch text/x-patch 585.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2022-11-11 23:10:20 Re: Add test module for Custom WAL Resource Manager feature
Previous Message Peter Geoghegan 2022-11-11 22:40:45 Re: Document WAL rules related to PD_ALL_VISIBLE in README