From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com> |
Subject: | Re: Perform streaming logical transactions by background workers and parallel apply |
Date: | 2022-10-26 11:19:08 |
Message-ID: | CAA4eK1LKLb+fD=o0BfPofzkbqSwRqrvBFXpkiujqDg9Uk9Q_=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Oct 25, 2022 at 8:38 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Oct 21, 2022 at 6:32 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> I've started to review this patch. I tested v40-0001 patch and have
> one question:
>
> IIUC even when most of the changes in the transaction are filtered out
> in pgoutput (eg., by relation filter or row filter), the walsender
> sends STREAM_START. This means that the subscriber could end up
> launching parallel apply workers also for almost empty (and streamed)
> transactions. For example, I created three subscriptions each of which
> subscribes to a different table. When I loaded a large amount of data
> into one table, all three (leader) apply workers received START_STREAM
> and launched their parallel apply workers.
>
The apply workers will be launched just the first time then we
maintain a pool so that we don't need to restart them.
> However, two of them
> finished without applying any data. I think this behaviour looks
> problematic since it wastes workers and rather decreases the apply
> performance if the changes are not large. Is it worth considering a
> way to delay launching a parallel apply worker until we find out the
> amount of changes is actually large?
>
I think even if changes are less there may not be much difference
because we have observed that the performance improvement comes from
not writing to file.
> For example, the leader worker
> writes the streamed changes to files as usual and launches a parallel
> worker if the amount of changes exceeds a threshold or the leader
> receives the second segment. After that, the leader worker switches to
> send the streamed changes to parallel workers via shm_mq instead of
> files.
>
I think writing to file won't be a good idea as that can hamper the
performance benefit in some cases and not sure if it is worth.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-10-26 13:54:17 | Re: Reducing duplicativeness of EquivalenceClass-derived clauses |
Previous Message | John Naylor | 2022-10-26 11:06:43 | Re: [PoC] Improve dead tuple storage for lazy vacuum |