Re: Perform streaming logical transactions by background workers and parallel apply

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-01-19 10:14:08
Message-ID: CAJpy0uDzddK_ZUsB2qBJUbW_ZODYGoUHTaS5pVcYE2tzATCVXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > > Here are some review comments for patch v79-0002.
> > >
> >
> > So, this is about the latest v84-0001-Stop-extra-worker-if-GUC-was-changed.
> >
> > >
> > > I feel this patch just adds more complexity for almost no gain:
> > > - reducing the 'max_apply_workers_per_suibscription' seems not very
> > > common in the first place.
> > > - even when the GUC is reduced, at that point in time all the workers
> > > might be in use so there may be nothing that can be immediately done.
> > > - IIUC the excess workers (for a reduced GUC) are going to get freed
> > > naturally anyway over time as more transactions are completed so the
> > > pool size will reduce accordingly.
> > >
> >
> > I am still not sure if it is worth pursuing this patch because of the
> > above reasons. I don't think it would be difficult to add this even at
> > a later point in time if we really see a use case for this.
> > Sawada-San, IIRC, you raised this point. What do you think?
> >
> > The other point I am wondering is whether we can have a different way
> > to test partial serialization apart from introducing another developer
> > GUC (stream_serialize_threshold). One possibility could be that we can
> > have a subscription option (parallel_send_timeout or something like
> > that) with some default value (current_timeout used in the patch)
> > which will be used only when streaming = parallel. Users may want to
> > wait for more time before serialization starts depending on the
> > workload (say when resource usage is high on a subscriber-side
> > machine, or there are concurrent long-running transactions that can
> > block parallel apply for a bit longer time). I know with this as well
> > it may not be straightforward to test the functionality because we
> > can't be sure how many changes would be required for a timeout to
> > occur. This is just for brainstorming other options to test the
> > partial serialization functionality.
> >
>
> Apart from the above, we can also have a subscription option to
> specify parallel_shm_queue_size (queue_size used to determine the
> queue between the leader and parallel worker) and that can be used for
> this purpose. Basically, configuring it to a smaller value can help in
> reducing the test time but still, it will not eliminate the need for
> dependency on timing we have to wait before switching to partial
> serialize mode. I think this can be used in production as well to tune
> the performance depending on workload.
>
> Yet another way is to use the existing parameter logical_decode_mode
> [1]. If the value of logical_decoding_mode is 'immediate', then we can
> immediately switch to partial serialize mode. This will eliminate the
> dependency on timing. The one argument against using this is that it
> won't be as clear as a separate parameter like
> 'stream_serialize_threshold' proposed by the patch but OTOH we already
> have a few parameters that serve a different purpose when used on the
> subscriber. For example, 'max_replication_slots' is used to define the
> maximum number of replication slots on the publisher and the maximum
> number of origins on subscribers. Similarly,
> wal_retrieve_retry_interval' is used for different purposes on
> subscriber and standby nodes.
>
> [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html
>
> --
> With Regards,
> Amit Kapila.

Hi Amit,

On rethinking the complete model, what I feel is that the name
logical_decoding_mode is not something which defines modes of logical
decoding. We, I think, picked it based on logical_decoding_work_mem.
As per current implementation, the parameter 'logical_decoding_mode'
tells what happens when work-memory used by logical decoding reaches
its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or
'logicalrep_trans_eviction_mode'. So if it is set to immediate,
meaning vacate the work-memory immediately or evict transactions
immediately. Add buffered means the reverse (i.e. keep on buffering
transactions until we reach a limit). Now coming to subscribers, we
can reuse the same parameter. On subscriber as well, shared-memory
queue could be considered as its workmem and thus the name
'logicalrep_workmem_vacate_mode' might look more relevant.

On publisher:
logicalrep_workmem_vacate_mode=immediate, buffered.

On subscriber:
logicalrep_workmem_vacate_mode=stream_serialize (or if we want to
keep immediate here too, that will also be fine).

Thoughts?
And I am assuming it is possible to change the GUC name before the
coming release. If not, please let me know, we can brainstorm other
ideas.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-01-19 10:16:29 Re: Rethinking the implementation of ts_headline()
Previous Message Niyas Sait 2023-01-19 10:09:01 Re: [PATCH] Add native windows on arm64 support