Re: Single transaction in the tablesync worker?

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-02 05:03:51
Message-ID: CAFPTHDaZw5o+wMbv3aveOzuLyz_rqZebXAj59rDKTJbwXFPYgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I have updated the patch to display WARNING for each of the tablesync
> slots during DropSubscription. As discussed, I have moved the drop
> slot related code towards the end in AlterSubscription_refresh. Apart
> from this, I have fixed one more issue in tablesync code where in
> after catching the exception we were not clearing the transaction
> state on the publisher, see changes in LogicalRepSyncTableStart. I
> have also fixed other comments raised by you. Additionally, I have
> removed the test because it was creating the same name slot as the
> tablesync worker and tablesync worker removed the same due to new
> logic in LogicalRepSyncStart. Earlier, it was not failing because of
> the bug in that code which I have fixed in the attached.
>

I was testing this patch. I had a table on the subscriber which had a
row that would cause a PK constraint
violation during the table copy. This is resulting in the subscriber
trying to rollback the table copy and failing.

2021-02-01 23:28:16.041 EST [23738] LOG: logical replication apply
worker for subscription "tap_sub" has started
2021-02-01 23:28:16.051 EST [23740] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:21.118 EST [23740] ERROR: table copy could not
rollback transaction on publisher
2021-02-01 23:28:21.118 EST [23740] DETAIL: The error was: another
command is already in progress
2021-02-01 23:28:21.122 EST [8028] LOG: background worker "logical
replication worker" (PID 23740) exited with exit code 1
2021-02-01 23:28:21.125 EST [23908] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:21.138 EST [23908] ERROR: could not create
replication slot "pg_16398_sync_16384": ERROR: replication slot
"pg_16398_sync_16384" already exists
2021-02-01 23:28:21.139 EST [8028] LOG: background worker "logical
replication worker" (PID 23908) exited with exit code 1
2021-02-01 23:28:26.168 EST [24048] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:34.244 EST [24048] ERROR: table copy could not
rollback transaction on publisher
2021-02-01 23:28:34.244 EST [24048] DETAIL: The error was: another
command is already in progress
2021-02-01 23:28:34.251 EST [8028] LOG: background worker "logical
replication worker" (PID 24048) exited with exit code 1
2021-02-01 23:28:34.254 EST [24337] LOG: logical replication table
synchronization worker for subscription "tap_sub", table "tab_rep" has
started
2021-02-01 23:28:34.263 EST [24337] ERROR: could not create
replication slot "pg_16398_sync_16384": ERROR: replication slot
"pg_16398_sync_16384" already exists
2021-02-01 23:28:34.264 EST [8028] LOG: background worker "logical
replication worker" (PID 24337) exited with exit code 1

And one more thing I see is that now we error out in PG_CATCH() in
LogicalRepSyncTableStart() with the above error and as a result, the
tablesync slot is not dropped. Hence causing the slot create to fail
in the next restart.
I think this can be avoided. We could either attempt a rollback only
on specific failures and drop slot prior to erroring out.

regards,
Ajin Cherian
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-02-02 05:19:01 Re: Typo in tablesync comment
Previous Message Amit Kapila 2021-02-02 05:03:26 Re: Single transaction in the tablesync worker?