Proposal: Filter irrelevant change before reassemble transactions during logical decoding

From: li jie <ggysxcq(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Date: 2023-11-30 09:47:57
Message-ID: CAGfChW51P944nM5h0HTV9HistvVfwBxNaMt_s-OZ9t=uXz+Zbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

During logical decoding, if there is a large write transaction, some
spill files will be written to disk,
depending on the setting of max_changes_in_memory.

This behavior can effectively avoid OOM, but if the transaction
generates a lot of change before commit,
a large number of files may fill the disk. For example, you can update
a TB-level table.
Of course, this is also inevitable.

But I found an inelegant phenomenon. If the updated large table is not
published, its changes will also
be written with a large number of spill files. Look at an example below:

publisher:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE PUBLICATION mypub FOR TABLE public.tbl_pub;
```

subscriber:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432
user=postgres dbname=postgres' PUBLICATION mypub;
```

publisher:
```
begin;
insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,999999) i;
```

Later you will see a large number of spill files in the
"/$PGDATA/pg_replslot/mysub/" directory.
```
$ll -sh
total 4.5G
4.0K -rw------- 1 postgres postgres 200 Nov 30 09:24 state
17M -rw------- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-10000000.spill
12M -rw------- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-1000000.spill
17M -rw------- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-11000000.spill
......
```

We can see that table tbl_t1 is not published in mypub. It is also not
sent downstream because it is subscribed.
After the transaction is reorganized, the pgoutput decoding plug-in
filters out these change of unpublished relation
when sending logical changes. see function pgoutput_change.

Above all, if after constructing a change and before queuing a change
into a transaction, we filter out unpublished
relation-related changes, will it make logical decoding less laborious
and avoid disk growth as much as possible?

This is just an immature idea. I haven't started to implement it yet.
Maybe it was designed this way because there
are key factors that I didn't consider. So I want to hear everyone's
opinions, especially the designers of logic decoding.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-11-30 09:49:11 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Alexander Korotkov 2023-11-30 09:37:07 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)