Re: Reduce useless changes before reassembly during logical replication

From: Andy Fan <zhihuifan1213(at)163(dot)com>
To: li jie <ggysxcq(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Reduce useless changes before reassembly during logical replication
Date: 2024-02-21 10:47:40
Message-ID: 87o7cadqj3.fsf@163.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hi Jie,

> Most importantly, if we filter out unpublished relationship-related
> changes after
> constructing the changes but before queuing the changes into a transaction,
> will it reduce the workload of logical decoding and avoid disk or memory growth
> as much as possible?

Thanks for the report!

Discarding the unused changes as soon as possible looks like a valid
optimization for me, but I pretty like more experienced people have a
double check.

> Attached is the patch I used to implement this optimization.

After a quick look at the patch, I found FilterByTable is too expensive
because of the StartTransaction and AbortTransaction. With your above
setup and run the below test:

insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,999100) i;

perf the wal sender of mypub for 30 seconds, then I get:

- 22.04% 1.53% postgres postgres [.] FilterByTable - 20.51% FilterByTable
AbortTransaction ResourceOwnerReleaseInternal LockReleaseAll hash_seq_search

The main part comes from AbortTransaction, and the 20% is not trivial.

From your patch:
+
+ /*
+ * Decoding needs access to syscaches et al., which in turn use
+ * heavyweight locks and such. Thus we need to have enough state around to
+ * keep track of those. The easiest way is to simply use a transaction
+ * internally.
+ ....
+ using_subtxn = IsTransactionOrTransactionBlock();
+
+ if (using_subtxn)
+ BeginInternalSubTransaction("filter change by table");
+ else
+ StartTransactionCommand();

Acutally FilterByTable here is simpler than "decoding", we access
syscache only when we find an entry in get_rel_sync_entry and the
replicate_valid is false, and the invalid case should rare.

What I'm thinking now is we allow the get_rel_sync_sync_entry build its
own transaction state *only when it find a invalid entry*. if the caller
has built it already, like the existing cases in master, nothing will
happen except a simple transaction state check. Then in the
FilterByTable case we just leave it for get_rel_sync_sync_entry. See the
attachemnt for the idea.

--
Best Regards
Andy Fan

Attachment Content-Type Size
v1-0001-Make-get_rel_sync_entry-less-depending-on-transac.patch text/x-diff 4.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-02-21 11:01:45 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Previous Message Daniel Gustafsson 2024-02-21 10:31:36 Re: BRIN integer overflow