Re: Perform streaming logical transactions by background workers and parallel apply

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-05-11 03:46:56
Message-ID: CAD21AoD8rDUeNCXS1Be-N3pe7_HG8F83_MnBbgzGOnJV54R6Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 10, 2022 at 6:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, May 10, 2022 at 10:35 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, May 4, 2022 at 12:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Tue, May 3, 2022 at 9:45 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, May 2, 2022 at 5:06 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, May 2, 2022 at 6:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Mon, May 2, 2022 at 11:47 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > Are you planning to support "Transaction dependency" Amit mentioned in
> > > > > > > his first mail in this patch? IIUC since the background apply worker
> > > > > > > applies the streamed changes as soon as receiving them from the main
> > > > > > > apply worker, a conflict that doesn't happen in the current streaming
> > > > > > > logical replication could happen.
> > > > > > >
> > > > > >
> > > > > > This patch seems to be waiting for stream_stop to finish, so I don't
> > > > > > see how the issues related to "Transaction dependency" can arise? What
> > > > > > type of conflict/issues you have in mind?
> > > > >
> > > > > Suppose we set both publisher and subscriber:
> > > > >
> > > > > On publisher:
> > > > > create table test (i int);
> > > > > insert into test values (0);
> > > > > create publication test_pub for table test;
> > > > >
> > > > > On subscriber:
> > > > > create table test (i int primary key);
> > > > > create subscription test_sub connection '...' publication test_pub; --
> > > > > value 0 is replicated via initial sync
> > > > >
> > > > > Now, both 'test' tables have value 0.
> > > > >
> > > > > And suppose two concurrent transactions are executed on the publisher
> > > > > in following order:
> > > > >
> > > > > TX-1:
> > > > > begin;
> > > > > insert into test select generate_series(0, 10000); -- changes will be streamed;
> > > > >
> > > > > TX-2:
> > > > > begin;
> > > > > delete from test where c = 0;
> > > > > commit;
> > > > >
> > > > > TX-1:
> > > > > commit;
> > > > >
> > > > > With the current streaming logical replication, these changes will be
> > > > > applied successfully since the deletion is applied before the
> > > > > (streamed) insertion. Whereas with the apply bgworker, it fails due to
> > > > > an unique constraint violation since the insertion is applied first.
> > > > > I've confirmed that it happens with v5 patch.
> > > > >
> > > >
> > > > Good point but I am not completely sure if doing transaction
> > > > dependency tracking for such cases is really worth it. I feel for such
> > > > concurrent cases users can anyway now also get conflicts, it is just a
> > > > matter of timing. One more thing to check transaction dependency, we
> > > > might need to spill the data for streaming transactions in which case
> > > > we might lose all the benefits of doing it via a background worker. Do
> > > > we see any simple way to avoid this?
> > > >
> >
> > I agree that it is just a matter of timing. I think new issues that
> > haven't happened on the current streaming logical replication
> > depending on the timing could happen with this feature and vice versa.
> >
>
> Here by vice versa, do you mean some problems that can happen with
> current code won't happen after new implementation? If so, can you
> give one such example?
>
> > >
> > > I think the other kind of problem that can happen here is delete
> > > followed by an insert. If in the example provided by you, TX-1
> > > performs delete (say it is large enough to cause streaming) and TX-2
> > > performs insert then I think it will block the apply worker because
> > > insert will start waiting infinitely. Currently, I think it will lead
> > > to conflict due to insert but that is still solvable by allowing users
> > > to remove conflicting rows.
> > >
> > > It seems both these problems are due to the reason that the table on
> > > publisher and subscriber has different constraints otherwise, we would
> > > have seen the same behavior on the publisher as well.
> > >
> > > There could be a few ways to avoid these and similar problems:
> > > a. detect the difference in constraints between publisher and
> > > subscribers like primary key and probably others (like whether there
> > > is any volatile function present in index expression) when applying
> > > the change and then we give ERROR to the user that she must change the
> > > streaming mode to 'spill' instead of 'apply' (aka parallel apply).
> > > b. Same as (a) but instead of ERROR just LOG this information and
> > > change the mode to spill for the transactions that operate on that
> > > particular relation.
> >
> > Given that it doesn't introduce a new kind of problem I don't think we
> > need special treatment for that at least in this feature.
> >
>
> Isn't the problem related to infinite wait by insert as explained in
> my previous email (in the above-quoted text) a new kind of problem
> that won't exist in the current implementation?
>

Sorry I had completely missed the point that the commit order won't be
changed. I agree that this new implementation would introduce a new
kind of issue as you mentioned above, and the opposite is not true.

Regarding the case you explained in the previous email I also think it
will happen with the parallel apply feature. The apply worker will be
blocked until the conflict is resolved. I'm not sure how to avoid
that. It would be not easy to compare constraints between publisher
and subscribers when replicating partitioning tables.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-05-11 03:50:11 Re: strange slow query - lost lot of time somewhere
Previous Message Bruce Momjian 2022-05-11 03:41:08 Re: First draft of the PG 15 release notes