Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2022-12-15 12:03:16
Message-ID: CAGPVpCQdZ_oj-QFcTOhTrUTs-NCKrrZ=ZNCNPR1qe27rXV-iYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached new versions of the patch with some changes/fixes.

Here also some numbers to compare the performance of log. rep. with this
patch against the current master branch.

My method of benchmarking is the same with what I did earlier in this
thread. (on a different environment, so not compare the result from this
email with the ones from earlier emails)

> With those changes, I did some benchmarking to see if it improves anything.
> This results compares this patch with the latest version of master branch.
> "max_sync_workers_per_subscription" is set to 2 as default.
> Got some results simply averaging timings from 5 consecutive runs for each
> branch.

Since this patch is expected to improve log. rep. of empty/close-to-empty
tables, started with measuring performance with empty tables.

| 10 tables | 100 tables | 1000 tables
------------------------------------------------------------------------------
master | 283.430 ms | 22739.107 ms | 105226.177 ms
------------------------------------------------------------------------------
patch | 189.139 ms | 1554.802 ms | 23091.434 ms

After the changes discussed here [1], concurrent replication origin drops
> by apply worker and tablesync workers may hold each other on wait due to
> locks taken by replorigin_drop_by_name.
> I see that this harms the performance of logical replication quite a bit
> in terms of speed.
> [1]
> https://www.postgresql.org/message-id/flat/20220714115155.GA5439%40depesz.com

Firstly, as I mentioned, replication origin drops made things worse for the
master branch.
Locks start being a more serious issue when the number of tables increases.
The patch reuses the origin so does not need to drop them in each
iteration. That's why the difference between the master and the patch is
more significant now than it was when I first sent the patch.

To just show that the improvement is not only the result of reuse of
origins, but also reuse of rep. slots and workers, I just reverted those
commits which causes the origin drop issue.

| 10 tables | 100 tables | 1000 tables
-----------------------------------------------------------------------------
reverted | 270.012 ms | 2483.907 ms | 31660.758 ms
-----------------------------------------------------------------------------
patch | 189.139 ms | 1554.802 ms | 23091.434 ms

With this patch, logical replication is still faster, even if we wouldn't
have an issue with rep. origin drops.

Also here are some numbers with 10 tables loaded with some data :

| 10 MB | 100 MB
----------------------------------------------------------
master | 2868.524 ms | 14281.711 ms
----------------------------------------------------------
patch | 1750.226 ms | 14592.800 ms

The difference between the master and the patch is getting close when the
size of tables increase, as expected.

I would appreciate any feedback/thought on the approach/patch/numbers etc.

Thanks,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v2-0001-Add-replication-protocol-cmd-to-create-a-snapshot.patch application/octet-stream 20.5 KB
v5-0002-Reuse-Logical-Replication-Background-worker.patch application/octet-stream 73.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-12-15 12:59:18 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Masahiko Sawada 2022-12-15 11:50:56 Re: plpgsq_plugin's stmt_end() is not called when an error is caught