pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

From: Nikhil Benesch <nikhil(dot)benesch(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: "sean(at)materialize(dot)com" <sean(at)materialize(dot)com>, "petrosagg(at)materialize(dot)com" <petrosagg(at)materialize(dot)com>
Subject: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15
Date: 2023-11-23 07:40:27
Message-ID: CAPWqQZTEpZQamYsGMn6ZDRvVywwpVPiKH6OY4KSgA+NmeqFNzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While working on Materialize's streaming logical replication from Postgres [0],
my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what
appears to be a correctness bug in pgoutput, introduced in v15.

The problem goes like this. A table with REPLICA IDENTITY FULL and some
data in it...

CREATE TABLE t (a int);
ALTER TABLE t REPLICA IDENTITY FULL;
INSERT INTO t VALUES (1), (2), (3), ...;

...undergoes a schema change to add a new column with a default:

ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;

PostgreSQL is smart and does not rewrite the entire table during the schema
change. Instead it updates the tuple description to indicate to future readers
of the table that if `b` is missing, it should be filled in with the default
value, `false`.

Unfortunately, since v15, pgoutput mishandles missing attributes. If a
downstream server is subscribed to changes from t via the pgoutput plugin, when
a row with a missing attribute is updated, e.g.:

UPDATE t SET a = 2 WHERE a = 1

pgoutput will incorrectly report b's value as NULL in the old tuple, rather than
false. Using the same example:

old: a=1, b=NULL
new: a=2, b=true

The subscriber will ignore the update (as it has no row with values
a=1, b=NULL), and thus the subscriber's copy of `t` will become out of sync with
the publisher's.

I bisected the problem to 52e4f0cd4 [1], which introduced row filtering for
publications. The problem appears to be the use of CreateTupleDescCopy where
CreateTupleDescCopyConstr is required, as the former drops the constraints
in the tuple description (specifically, the default value constraint) on the
floor.

I've attached a patch which both fixes the issue and includes a test. I've
verified that the test fails against the current master and passes against
the patched version.

I'm relatively unfamiliar with the project norms here, but assuming the patch is
acceptable, this strikes me as important enough to warrant a backport to both
v15 and v16.

[0]: https://materialize.com/docs/sql/create-source/postgres
[1]: https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78

Attachment Content-Type Size
0001-pgoutput-don-t-unconditionally-fill-in-missing-value.patch application/octet-stream 4.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xiang Gao 2023-11-23 08:05:26 RE: CRC32C Parallel Computation Optimization on ARM
Previous Message Peter Smith 2023-11-23 07:27:04 Re: GUC names in messages