Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-04 04:59:42
Message-ID: CAA4eK1Ld9XaLoTZCoKF_gET7kc1fDf8CPR3CM48MQb1N1jDLYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 4, 2020 at 7:53 AM Craig Ringer
<craig(dot)ringer(at)enterprisedb(dot)com> wrote:
>
> On Thu, 3 Dec 2020 at 17:25, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > Is there any fundamental problem if
> > we commit the transaction after initial copy and slot creation in
> > LogicalRepSyncTableStart and then allow the apply of transactions as
> > it happens in apply worker?
>
> No fundamental problem. Both approaches are fine. Committing the
> initial copy then doing the rest in individual txns means an
> incomplete sync state for the table becomes visible, which may not be
> ideal. Ideally we'd do something like sync the data into a clone of
> the table then swap the table relfilenodes out once we're synced up.
>
> IMO the main advantage of committing as we go is that it would let us
> use a non-temporary slot and support recovering an incomplete sync and
> finishing it after interruption by connection loss, crash, etc. That
> would be advantageous for big table syncs or where the sync has lots
> of lag to replay. But it means we have to remember sync states, and
> give users a way to cancel/abort them. Otherwise forgotten temp slots
> for syncs will cause a mess on the upstream.
>
> It also allows the sync slot to advance, freeing any held upstream
> resources before the whole sync is done, which is good if the upstream
> is busy and generating lots of WAL.
>
> Finally, committing as we go means we won't exceed the cid increment
> limit in a single txn.
>

Yeah, all these are advantages of processing
transaction-by-transaction. IIUC, we need to primarily do two things
to achieve it, one is to have an additional state in the catalog (say
catch up) which will say that the initial copy is done. Then we need
to have a permanent slot using which we can track the progress of the
slot so that after restart (due to crash, connection break, etc.) we
can start from the appropriate position.

Apart from the above, I think with the current design of tablesync we
can see partial data of transactions because we allow all the
tablesync workers to run parallelly. Consider the below scenario:

CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));
CREATE TABLE mytbl2(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

Tx1
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl2(somedata, text) VALUES (1, 1);
COMMIT;

CREATE PUBLICATION mypublication FOR TABLE mytbl;

CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypublication;

Tx2
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
INSERT INTO mytbl2(somedata, text) VALUES (1, 2);
Commit;

Tx3
BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 3);
INSERT INTO mytbl2(somedata, text) VALUES (1, 3);
Commit;

Now, I could see the below results on subscriber:

postgres=# select * from mytbl1;
id | somedata | text
----+----------+------
(0 rows)

postgres=# select * from mytbl2;
id | somedata | text
----+----------+------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
(3 rows)

Basically, the results for Tx1, Tx2, Tx3 are visible for mytbl2 but
not for mytbl1. To reproduce this I have stopped the tablesync workers
(via debugger) for mytbl1 and mytbl2 in LogicalRepSyncTableStart
before it changes the relstate to SUBREL_STATE_SYNCWAIT. Then allowed
Tx2 and Tx3 to be processed by apply worker and then allowed tablesync
worker for mytbl2 to proceed. After that, I can see the above state.

Now, won't this behavior be considered as transaction inconsistency
where partial transaction data or later transaction data is visible? I
don't think we can have such a situation on the master (publisher)
node or in physical standby.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-12-04 05:04:27 Re: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Shinya11.Kato 2020-12-04 04:20:47 RE: Wrong statistics for size of XLOG_SWITCH during pg_waldump.