From: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Notify downstream to discard the streamed transaction which was aborted due to crash. |
Date: | 2023-01-06 03:54:53 |
Message-ID: | OS0PR01MB5716A773F46768A1B75BE24394FB9@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
When developing another feature, I find an existing bug which was reported from Dilip[1].
Currently, it's possible that we only send a streaming block without sending a
end of stream message(stream abort) when decoding and streaming a transaction
that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT
for such a crashed transaction. This will cause the subscriber(with
streaming=on) create a stream file but won't delete it until the apply
worker restart.
BUG repro(borrowed from Dilip):
---
1. start 2 servers(config: logical_decoding_work_mem=64kB)
./pg_ctl -D data/ -c -l pub_logs start
./pg_ctl -D data1/ -c -l sub_logs start
2. Publisher:
create table t(a int PRIMARY KEY ,b text);
create publication test_pub for table t
with(PUBLISH='insert,delete,update,truncate');
alter table t replica identity FULL ;
3. Subscription Server:
create table t(a int,b text);
create subscription test_sub CONNECTION 'host=localhost port=10000
dbname=postgres' PUBLICATION test_pub WITH ( slot_name =
test_slot_sub1,streaming=on);
4. Publication Server:
begin ;
insert into t values (generate_series(1,50000), 'zzzzzzzzz'); -- (while executing this restart publisher in 2-3 secs)
Restart the publication server, while the transaction is still in an
uncommitted state.
./pg_ctl -D data/ -c -l pub_logs restart -mi
---
After restarting the publisher, we can see the subscriber receive a streaming
block and create a stream file(/base/pgsql_tmp/xxx.fileset).
To fix it, One idea is to send a stream abort message when we are cleaning up
crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is
a tiny patch which changes the same. I have confirmed that the bug is fixed and
all regression tests pass. I didn't add a testcase because we need to make sure
the crash happens before all the WAL logged transactions data are decoded which
doesn't seem easy to write a stable test for this.
Thoughts ?
Best regards,
Hou zj
Attachment | Content-Type | Size |
---|---|---|
0001-fix-stream-changes-for-crashed-transaction.patch | application/octet-stream | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2023-01-06 04:07:23 | Re: pg_ftruncate hardcodes length=0 but only under windows |
Previous Message | Andres Freund | 2023-01-06 03:40:36 | Re: Minimal logical decoding on standbys |