Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Single transaction in the tablesync worker?
Date: 2020-12-03 09:27:08
Message-ID: CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The tablesync worker in logical replication performs the table data
sync in a single transaction which means it will copy the initial data
and then catch up with apply worker in the same transaction. There is
a comment in LogicalRepSyncTableStart ("We want to do the table data
sync in a single transaction.") saying so but I can't find the
concrete theory behind the same. Is there any fundamental problem if
we commit the transaction after initial copy and slot creation in
LogicalRepSyncTableStart and then allow the apply of transactions as
it happens in apply worker? I have tried doing so in the attached (a
quick prototype to test) and didn't find any problems with regression
tests. I have tried a few manual tests as well to see if it works and
didn't find any problem. Now, it is quite possible that it is
mandatory to do the way we are doing currently, or maybe something
else is required to remove this requirement but I think we can do
better with respect to comments in this area.

The reason why I am looking into this area is to support the logical
decoding of prepared transactions. See the problem [1] reported by
Peter Smith. Basically, when we stream prepared transactions in the
tablesync worker, it will simply commit the same due to the
requirement of maintaining a single transaction for the entire
duration of copy and streaming of transactions. Now, we can fix that
problem by disabling the decoding of prepared xacts in tablesync
worker. But that will arise to a different kind of problems like the
prepare will not be sent by the publisher but a later commit might
move lsn to a later step which will allow it to catch up till the
apply worker. So, now the prepared transaction will be skipped by both
tablesync and apply worker.

I think apart from unblocking the development of 'logical decoding of
prepared xacts', it will make the code consistent between apply and
tablesync worker and reduce the chances of future bugs in this area.
Basically, it will reduce the checks related to am_tablesync_worker()
at various places in the code.

I see that this code is added as part of commit
7c4f52409a8c7d85ed169bbbc1f6092274d03920 (Logical replication support
for initial data copy).

Thoughts?

[1] - https://www.postgresql.org/message-id/CAHut+PuEMk4SO8oGzxc_ftzPkGA8uC-y5qi-KRqHSy_P0i30DA@mail.gmail.com

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v1-0001-Allow-more-than-one-transaction-in-tablesync-work.patch application/octet-stream 3.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-12-03 09:31:43 REINDEX backend filtering
Previous Message Peter Eisentraut 2020-12-03 09:24:03 Re: Improper use about DatumGetInt32