Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL
Date: 2023-03-11 19:59:37
Message-ID: CACawEhVQC9WoofunvXg12aXtbqKnEgWxoRx3+v8q32AWYsdpGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

(cc'ed Amit as he has the context)

While working on [1], I realized that on HEAD there is a problem with the
$subject. Here is the relevant discussion on the thread [2]. Quoting my
own notes on that thread below;

I realized that the dropped columns also get into the tuples_equal()
> function. And,
> the remote sends NULL to for the dropped columns(i.e., remoteslot), but
> index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
> columns on the outslot. So, the dropped columns are not NULL in the outslot

Amit also suggested checking generated columns, which indeed has the same
problem.

Here are the steps to repro the problem with dropped columns:

- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM
generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3
timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;

-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

**-- target, OOPS wrong result!!!!**SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│ 1 │
└───────┘
(1 row)

Attaching a patch that could possibly solve the problem.

Thanks,
Onder KALACI

[1]
https://www.postgresql.org/message-id/flat/CACawEhUN%3D%2BvjY0%2B4q416-rAYx6pw-nZMHQYsJZCftf9MjoPN3w%40mail.gmail.com#2f7fa76f9e4496e3b52a9be6736e5b43
[2]
https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com

Attachment Content-Type Size
v1-0001-Skip-dropped-and-generated-columns-when-REPLICA-I.patch application/octet-stream 4.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2023-03-11 20:04:26 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message Nitin Jadhav 2023-03-11 19:22:00 Re: Improve WALRead() to suck data directly from WAL buffers when possible