Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, shiy(dot)fnst(at)fujitsu(dot)com
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-02-01 12:07:25
Message-ID: CAGPVpCT6wG=Eqd2Jc20yO9xNp4S3_XvDPinz2JbaLTtb7M5jHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

wangw(dot)fnst(at)fujitsu(dot)com <wangw(dot)fnst(at)fujitsu(dot)com>, 31 Oca 2023 Sal, 13:40
tarihinde şunu yazdı:

> Sorry, I forgot to add the link to the email. Please refer to [1].
>
> [1] -
> https://www.postgresql.org/message-id/OSZPR01MB631013C833C98E826B3CFCB9FDC69%40OSZPR01MB6310.jpnprd01.prod.outlook.com

Thanks for pointing out this review. I somehow skipped that, sorry.

Please see attached patches.

shiy(dot)fnst(at)fujitsu(dot)com <shiy(dot)fnst(at)fujitsu(dot)com>, 17 Oca 2023 Sal, 10:46
tarihinde şunu yazdı:

> On Wed, Jan 11, 2023 4:31 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> 0001 patch
> ============
> 1. walsender.c
> + /* Create a tuple to send consisten WAL location */
>
> "consisten" should be "consistent" I think.
>

Done.

> 2. logical.c
> + if (need_full_snapshot)
> + {
> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> +
> + SpinLockAcquire(&slot->mutex);
> + slot->effective_catalog_xmin = xmin_horizon;
> + slot->data.catalog_xmin = xmin_horizon;
> + slot->effective_xmin = xmin_horizon;
> + SpinLockRelease(&slot->mutex);
> +
> + xmin_horizon =
> GetOldestSafeDecodingTransactionId(!need_full_snapshot);
> + ReplicationSlotsComputeRequiredXmin(true);
> +
> + LWLockRelease(ProcArrayLock);
> + }
>
> It seems that we should first get the safe decoding xid, then inform the
> slot
> machinery about the new limit, right? Otherwise the limit will be
> InvalidTransactionId and that seems inconsistent with the comment.
>

You're right. Moved that call before assigning xmin_horizon.

> 3. doc/src/sgml/protocol.sgml
> + is used in the currenct transaction. This command is currently
> only supported
> + for logical replication.
> + slots.
>
> We don't need to put "slots" in a new line.
>

Done.

> 0002 patch
> ============
> 1.
> In pg_subscription_rel.h, I think the type of "srrelslotname" can be
> changed to
> NameData, see "subslotname" in pg_subscription.h.
>
> 2.
> + * Find the logical replication sync
> worker if exists store
> + * the slot number for dropping associated
> replication slots
> + * later.
>
> Should we add comma after "if exists"?
>

Done.

3.
> + PG_FINALLY();
> + {
> + pfree(cmd.data);
> + }
> + PG_END_TRY();
> + \
> + return tablelist;
> +}
>
> Do we need the backslash?
>

Removed it.

> 4.
> + /*
> + * Advance to the LSN got from walrcv_create_slot. This is WAL
> + * logged for the purpose of recovery. Locks are to prevent the
> + * replication origin from vanishing while advancing.
>
> "walrcv_create_slot" should be changed to
> "walrcv_create_slot/walrcv_slot_snapshot" I think.

Right, done.

>
>
5.
> + /* Replication drop might still exist. Try to drop
> */
> + replorigin_drop_by_name(originname, true, false);
>
> Should "Replication drop" be "Replication origin"?
>

Done.

> 6.
> I saw an assertion failure in the following case, could you please look
> into it?
> The backtrace is attached.
>
> -- pub
> CREATE TABLE tbl1 (a int, b text);
> CREATE TABLE tbl2 (a int primary key, b text);
> create publication pub for table tbl1, tbl2;
> insert into tbl1 values (1, 'a');
> insert into tbl1 values (1, 'a');
>
> -- sub
> CREATE TABLE tbl1 (a int primary key, b text);
> CREATE TABLE tbl2 (a int primary key, b text);
> create subscription sub connection 'dbname=postgres port=5432' publication
> pub;
>
> Subscriber log:
> 2023-01-17 14:47:10.054 CST [1980841] LOG: logical replication apply
> worker for subscription "sub" has started
> 2023-01-17 14:47:10.060 CST [1980843] LOG: logical replication table
> synchronization worker for subscription "sub", table "tbl1" has started
> 2023-01-17 14:47:10.070 CST [1980845] LOG: logical replication table
> synchronization worker for subscription "sub", table "tbl2" has started
> 2023-01-17 14:47:10.073 CST [1980843] ERROR: duplicate key value violates
> unique constraint "tbl1_pkey"
> 2023-01-17 14:47:10.073 CST [1980843] DETAIL: Key (a)=(1) already exists.
> 2023-01-17 14:47:10.073 CST [1980843] CONTEXT: COPY tbl1, line 2
> 2023-01-17 14:47:10.074 CST [1980821] LOG: background worker "logical
> replication worker" (PID 1980843) exited with exit code 1
> 2023-01-17 14:47:10.083 CST [1980845] LOG: logical replication table
> synchronization worker for subscription "sub", table "tbl2" has finished
> 2023-01-17 14:47:10.083 CST [1980845] LOG: logical replication table
> synchronization worker for subscription "sub" has moved to sync table
> "tbl1".
> TRAP: failed Assert("node != InvalidRepOriginId"), File: "origin.c", Line:
> 892, PID: 1980845
>

I'm not able to reproduce this yet. Will look into it further.

Thanks,
--
Melih Mutlu
Microsoft

Attachment Content-Type Size
v10-0002-Reuse-Logical-Replication-Background-worker.patch application/octet-stream 71.4 KB
v7-0001-Add-replication-protocol-cmd-to-create-a-snapshot.patch application/octet-stream 20.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melih Mutlu 2023-02-01 12:12:19 Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message Antonin Houska 2023-02-01 12:02:07 Re: Privileges on PUBLICATION