RE: logical decoding and replication of sequences, take 2

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: RE: logical decoding and replication of sequences, take 2
Date: 2023-12-14 03:44:22
Message-ID: TY3PR01MB98899FA0082AE9918886E150F58CA@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

> It is correct that we can make a wrong decision about whether a change
> is transactional or non-transactional when sequence DDL happens before
> the SNAPBUILD_FULL_SNAPSHOT state and the sequence operation happens
> after that state.

I found a workload which decoder distinguish wrongly.

# Prerequisite

Apply an attached patch for inspecting the sequence status. It can be applied atop v20231203 patch set.
Also, a table and a sequence must be defined:

```
CREATE TABLE foo (var int);
CREATE SEQUENCE s;
```

# Workload

Then, you can execute concurrent transactions from three clients like below:

Client-1

BEGIN;
INSERT INTO foo VALUES (1);

Client-2

SELECT pg_create_logical_replication_slot('slot', 'test_decoding');

Client-3

BEGIN;
ALTER SEQUENCE s MAXVALUE 5000;
COMMIT;
SAVEPOINT s1;
SELECT setval('s', 2000);
ROLLBACK;

SELECT pg_logical_slot_get_changes('slot', 'test_decoding');

# Result and analysis

At first, below lines would be output on the log. This meant that WAL records
for ALTER SEQUENCE were decoded but skipped because the snapshot had been building.

```
...
LOG: logical decoding found initial starting point at 0/154D238
DETAIL: Waiting for transactions (approximately 1) older than 741 to end.
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: smgr_decode. snapshot is SNAPBUILD_BUILDING_SNAPSHOT
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: skipped
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: seq_decode. snapshot is SNAPBUILD_BUILDING_SNAPSHOT
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: skipped
...
```

Note that above `seq_decode...` line was not output via `setval()`, it was done
by ALTER SEQUENCE statement. Below is a call stack for inserting WAL.

```
XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
fill_seq_fork_with_data
fill_seq_with_data
AlterSequence
```

Then, subsequent lines would say like them. This means that the snapshot becomes
FULL and `setval()` is regarded non-transactional wrongly.

```
LOG: logical decoding found initial consistent point at 0/154D658
DETAIL: Waiting for transactions (approximately 1) older than 742 to end.
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: seq_decode. snapshot is SNAPBUILD_FULL_SNAPSHOT
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: the sequence is non-transactional
STATEMENT: SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
LOG: XXX: not consistent: skipped
```

The change would be discarded because the snapshot has not been CONSISTENT yet
by the below part. If it has been transactional, we would have queued this
change though the transaction will be skipped at commit.

```
else if (!transactional &&
(SnapBuildCurrentState(builder) != SNAPBUILD_CONSISTENT ||
SnapBuildXactNeedsSkip(builder, buf->origptr)))
return;
```

But anyway, we could find a case which we can make a wrong decision. This example
is lucky - does not output wrongly, but I'm not sure all the case like that.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
add_elog.txt text/plain 3.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-12-14 04:45:21 Re: Synchronizing slots from primary to standby
Previous Message Amit Kapila 2023-12-14 03:34:09 Re: "pgoutput" options missing on documentation