[bug] Logical Decoding of relation rewrite with toast does not reset toast_hash

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Schneider (AWS), Jeremy" <schnjere(at)amazon(dot)com>
Subject: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
Date: 2021-07-09 06:51:34
Message-ID: b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

During logical decoding, we have recently observed (and reported in [1])
errors like:

ERROR:  could not open relation with OID 0

After investigating a recent issue on a PostgreSQL database that was
encountering this error, we found that the logical decoding of relation
rewrite with toast could produce this error without resetting the
toast_hash.

We were able to create this repro of the error:

postgres=# \! cat bdt_repro.sql
select pg_create_logical_replication_slot('bdt_slot','test_decoding');
CREATE TABLE tbl1 (a INT, b TEXT);
CREATE TABLE tbl2 (a INT);
ALTER TABLE tbl1 ALTER COLUMN b SET STORAGE EXTERNAL;
BEGIN;
INSERT INTO tbl1 VALUES(1, repeat('a', 4000)) ;
ALTER TABLE tbl1 ADD COLUMN id serial primary key;
INSERT INTO tbl2 VALUES(1);
commit;
select * from pg_logical_slot_get_changes('bdt_slot', null, null);

That ends up on 12.5 with:

ERROR:  could not open relation with OID 0

And on current master with:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The issue has been introduced by 325f2ec555 (and more precisely by its
change in reorderbuffer.c), so it does affect pre v11 versions:

git branch -r --contains 325f2ec555
  origin/HEAD -> origin/master
  origin/REL_11_STABLE
  origin/REL_12_STABLE
  origin/REL_13_STABLE
  origin/REL_14_STABLE
  origin/master

The fact that current master is producing a different behavior than 12.5
(for example) is due to 4daa140a2f that generates a failed assertion in
such a case (after going to the code path that should print out the ERROR):

#2  0x0000000000b29fab in ExceptionalCondition (conditionName=0xce6850
"(rb->size >= sz) && (txn->size >= sz)", errorType=0xce5f84
"FailedAssertion", fileName=0xce5fd0 "reorderbuffer.c", lineNumber=3141)
at assert.c:69
#3  0x00000000008ff1fb in ReorderBufferChangeMemoryUpdate (rb=0x11a7a40,
change=0x11c94b8, addition=false) at reorderbuffer.c:3141
#4  0x00000000008fab27 in ReorderBufferReturnChange (rb=0x11a7a40,
change=0x11c94b8, upd_mem=true) at reorderbuffer.c:477
#5  0x0000000000902ec1 in ReorderBufferToastReset (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:4799
#6  0x00000000008faaa2 in ReorderBufferReturnTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:448
#7  0x00000000008fc95b in ReorderBufferCleanupTXN (rb=0x11a7a40,
txn=0x11b1998) at reorderbuffer.c:1540

I am adding Amit to this thread to make him aware than this is related
to the recent issue Jeremy and I were talking about in [1] - which we
now believe is not linked to the logical decoding and speculative insert
bug fixed in 4daa140a2f but likely to this new toast rewrite bug.

Please find enclosed a patch proposal to:

* Avoid the failed assertion on current master and generate the error
message instead (should the code reach that stage).
* Reset the toast_hash in case of relation rewrite with toast (so that
the logical decoding in the above repro is working).

I am adding this patch to the next commitfest.

Thanks
Bertrand

[1]:
https://www.postgresql.org/message-id/CAA4eK1KcUPwwhDVhJmdQExc09AzEBZMGbOa-u3DYaJs1zzfEnA%40mail.gmail.com

Attachment Content-Type Size
v1-0001-toast-rewrite.patch text/plain 1.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2021-07-09 06:56:28 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Greg Nancarrow 2021-07-09 06:42:42 Re: Added schema level support for publication.