RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: 'Shlok Kyal' <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, 'vignesh C' <vignesh21(at)gmail(dot)com>, 'Michael Paquier' <michael(at)paquier(dot)xyz>, 'Peter Eisentraut' <peter(at)eisentraut(dot)org>, 'Andres Freund' <andres(at)anarazel(dot)de>, 'Ashutosh Bapat' <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, 'Euler Taveira' <euler(at)eulerto(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-01-22 07:06:50
Message-ID: TY3PR01MB9889C5D55206DDD978627D07F5752@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

>
> 15.
> I found that subscriptions cannot be started if tuples are inserted on publisher
> after creating temp_replslot. After starting a subscriber, I got below output on the
> log.
>
> ```
> ERROR: could not receive data from WAL stream: ERROR: publication
> "pg_subscriber_5" does not exist
> CONTEXT: slot "pg_subscriber_5_3632", output plugin "pgoutput", in the change
> callback, associated LSN 0/30008A8
> LOG: background worker "logical replication apply worker" (PID 3669) exited
> with exit code 1
> ```
>
> But this is strange. I confirmed that the specified publication surely exists.
> Do you know the reason?
>
> ```
> publisher=# SELECT pubname FROM pg_publication;
> pubname
> -----------------
> pg_subscriber_5
> (1 row)
> ```
>

I analyzed and found a reason. This is because publications are invisible for some transactions.

As the first place, below operations were executed in this case.
Tuples were inserted after getting consistent_lsn, but before starting the standby.
After doing the workload, I confirmed again that the publication was created.

1. on primary, logical replication slots were created.
2. on primary, another replication slot was created.
3. ===on primary, some tuples were inserted. ===
4. on standby, a server process was started
5. on standby, the process waited until all changes have come.
6. on primary, publications were created.
7. on standby, subscriptions were created.
8. on standby, a replication progress for each subscriptions was set to given LSN (got at step2).
=====pg_subscriber finished here=====
9. on standby, a server process was started again
10. on standby, subscriptions were enabled. They referred slots created at step1.
11. on primary, decoding was started but ERROR was raised.

In this case, tuples were inserted *before creating publication*.
So I thought that the decoded transaction could not see the publication because
it was committed after insertions.

One solution is to create a publication before creating a consistent slot.
Changes which came before creating the slot were surely replicated to the standby,
so upcoming transactions can see the object. We are planning to patch set to fix
the issue in this approach.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-01-22 07:18:35 Re: the s_lock_stuck on perform_spin_delay
Previous Message Masahiko Sawada 2024-01-22 07:00:08 Re: [PoC] Improve dead tuple storage for lazy vacuum