Re: Logical replication missing information

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: bobocc(at)yahoo(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication missing information
Date: 2022-11-23 02:44:49
Message-ID: CAHut+Pui7LCgfzkjcqvdcPy+V9G8CRb6X5hL+MB7QqXFBhQMzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Fri, Nov 18, 2022 at 4:50 AM PG Doc comments form
<noreply(at)postgresql(dot)org> wrote:
>
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/15/logical-replication-row-filter.html
> Description:

Hi,

FYI - I have forwarded this post to the hacker's list, where I think
it will receive more attention.

I am not sure why that (above) page was cited -- the section "31.3 Row
Filters" is specifically about row filtering, whereas the items you
reported seem unrelated to row filters, but are generic for all
Logical Replication.

>
> There are several things missing here and some of them I found to be highly
> important:
> 1. How can I find why a logical replication failed. Currently I only can see
> it "does nothing" in pg_stat_subscriptions.

There should be logs reporting any replication conflicts etc. See [1]
for example logs. See also the answer for #2 below.

> 2. In case of copying the existing data: how can I find which tables or
> partitions were processed and which are on the processing queue (while
> monitoring I have observed no specific order or rule).

There is no predictable processing queue or order - The initial
tablesyncs might be happening in multiple asynchronous processes
according to the GUC max_sync_workers_per_subscription [2].

Below I show examples of replicating two tables (tab1 and tab2).

~~

From the logs you should see which table syncs have completed OK:

e.g. (the initial copy is all good)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE: created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:23:18.501 AEDT [27961] LOG: logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:23:18.513 AEDT [27963] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:23:18.524 AEDT [27965] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:23:18.593 AEDT [27963] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
2022-11-23 12:23:18.611 AEDT [27965] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab2" has
finished

e.g. (where there is conflict in table tab2)
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub' PUBLICATION pub1;
NOTICE: created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
test_sub=# 2022-11-23 12:40:56.794 AEDT [23401] LOG: logical
replication apply worker for subscription "sub1" has started
2022-11-23 12:40:56.808 AEDT [23403] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab1" has
started
2022-11-23 12:40:56.819 AEDT [23405] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:40:56.890 AEDT [23405] ERROR: duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:40:56.890 AEDT [23405] DETAIL: Key (id)=(1) already exists.
2022-11-23 12:40:56.890 AEDT [23405] CONTEXT: COPY tab2, line 1
2022-11-23 12:40:56.891 AEDT [3233] LOG: background worker "logical
replication worker" (PID 23405) exited with exit code 1
2022-11-23 12:40:56.902 AEDT [23403] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab1" has
finished
...

~~

Alternatively, you can use some SQL query to discover which tables of
the subscription had attained a READY state. The READY state (denoted
by 'r') means that the initial COPY was completed ok. The table
replication state is found in the 'srsubstate' column. See [3]

e.g. (the initial copy is all good)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;
srsubid | srrelid | subname | relname | srsubstate
---------+---------+---------+---------+------------
16418 | 16409 | sub1 | tab1 | r
16418 | 16402 | sub1 | tab2 | r
(2 rows)

e.g. (where it has a conflict in table tab2, so it did not get to READY state)
test_sub=# select
sr.srsubid,sr.srrelid,s.subname,ut.relname,sr.srsubstate from
pg_statio_user_tables ut, pg_subscription_rel sr, pg_subscription s
where ut.relid = sr.srrelid and s.oid=sr.srsubid;2022-11-23
12:41:37.686 AEDT [24501] LOG: logical replication table
synchronization worker for subscription "sub1", table "tab2" has
started
2022-11-23 12:41:37.774 AEDT [24501] ERROR: duplicate key value
violates unique constraint "tab2_pkey"
2022-11-23 12:41:37.774 AEDT [24501] DETAIL: Key (id)=(1) already exists.
2022-11-23 12:41:37.774 AEDT [24501] CONTEXT: COPY tab2, line 1
2022-11-23 12:41:37.775 AEDT [3233] LOG: background worker "logical
replication worker" (PID 24501) exited with exit code 1

srsubid | srrelid | subname | relname | srsubstate
---------+---------+---------+---------+------------
16423 | 16409 | sub1 | tab1 | r
16423 | 16402 | sub1 | tab2 | d

> 3. In case of copying the existing data there is no option to update the row
> based on the Primary Key if it already exists at the destination. The COPY
> only fails.

Yes, the conflicts section [1] describes this -- "A conflict will
produce an error and will stop the replication; it must be resolved
manually by the user."

> 4. Is it possible to restart an interrupted logical replication. If yes,
> then how? Taking in account the already existing data!!!

You can use the SUBSCRIPTION copy_data=false parameter to avoid
re-copying initial data. But this applies to all tables of the
subscription so if you have a situation where there are some tables
copied and some not copied then you might have to either truncate the
tables and start again, or you might want to create additional
temporary subscriptions with appropriate copy_data=true/false
parameter. I guess the best course of action depends if you had
conflicts with 1 or 2 tables or 10000 tables.

>
> IMHO there are some big functionality features still missing, but they can
> be added.
>

I am not sure if there is missing functionality, but perhaps there is
some information that is harder to find than it ought to be, so I
would like to help first address that part.

------
[1] conflicts. https://www.postgresql.org/docs/current/logical-replication-conflicts.html
[2] max_sync_workers_per_subscription.
https://www.postgresql.org/docs/current/runtime-config-replication.html
[3] srsubstate.
https://www.postgresql.org/docs/current/catalog-pg-subscription-rel.html

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Kirk Wolak 2022-11-23 03:41:57 Re: nextval parameter is not clear
Previous Message Tom Lane 2022-11-22 21:35:52 Re: crosstab documentation should add a note about use in materialized views

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-11-23 02:52:22 Re: ssl tests aren't concurrency safe due to get_free_port()
Previous Message Andres Freund 2022-11-23 01:42:24 odd buildfarm failure - "pg_ctl: control file appears to be corrupt"