RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'vignesh C' <vignesh21(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-20 10:17:18
Message-ID: TYCPR01MB12077C74ACCF1F16777EE76F1F5502@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Vignesh,

Thanks for giving comments!

> Few comments for v22-0001 patch:
> 1) The second "if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)"" should
> be if (strcmp(PQgetvalue(res, 0, 2), "t") != 0):
> + if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> + {
> + pg_log_error("permission denied for database %s",
> dbinfo[0].dbname);
> + return false;
> + }
> + if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
> + {
> + pg_log_error("permission denied for function \"%s\"",
> +
> "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
> + return false;
> + }

I have already pointed out as comment #8 [1] and fixed in v22-0005.

> 2) pg_createsubscriber fails if a table is parallely created in the
> primary node:
> 2024-02-20 14:38:49.005 IST [277261] LOG: database system is ready to
> accept connections
> 2024-02-20 14:38:54.346 IST [277270] ERROR: relation "public.tbl5"
> does not exist
> 2024-02-20 14:38:54.346 IST [277270] STATEMENT: CREATE SUBSCRIPTION
> pg_createsubscriber_5_277236 CONNECTION ' dbname=postgres' PUBLICATION
> pg_createsubscriber_5 WITH (create_slot = false, copy_data = false,
> enabled = false)
>
> If we are not planning to fix this, at least it should be documented

The error will be occurred when tables are created after the promotion, right?
I think it cannot be fixed until DDL logical replication would be implemented.
So, +1 to add descriptions.

> 3) Error conditions is verbose mode has invalid error message like
> "out of memory" messages like in below:
> pg_createsubscriber: waiting the postmaster to reach the consistent state
> pg_createsubscriber: postmaster reached the consistent state
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: creating subscription
> "pg_createsubscriber_5_278343" on database "postgres"
> pg_createsubscriber: error: could not create subscription
> "pg_createsubscriber_5_278343" on database "postgres": out of memory

Because some places use PQerrorMessage() wrongly. It should be
PQresultErrorMessage(). Fixed in v22-0005.

> 4) In error cases we try to drop this publication again resulting in error:
> + /*
> + * Since the publication was created before the
> consistent LSN, it is
> + * available on the subscriber when the physical
> replica is promoted.
> + * Remove publications from the subscriber because it
> has no use.
> + */
> + drop_publication(conn, &dbinfo[i]);
>
> Which throws these errors(because of drop publication multiple times):
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: error: could not drop publication
> "pg_createsubscriber_5" on database "postgres": ERROR: publication
> "pg_createsubscriber_5" does not exist
> pg_createsubscriber: dropping publication "pg_createsubscriber_5" on
> database "postgres"
> pg_createsubscriber: dropping the replication slot
> "pg_createsubscriber_5_278343" on database "postgres"

Right. One approach is to use DROP PUBLICATION IF EXISTS statement.
Thought?

> 5) In error cases, wait_for_end_recovery waits even though it has
> identified that the replication between primary and standby is
> stopped:
> +/*
> + * Is recovery still in progress?
> + * If the answer is yes, it returns 1, otherwise, returns 0. If an error occurs
> + * while executing the query, it returns -1.
> + */
> +static int
> +server_is_in_recovery(PGconn *conn)
> +{
> + PGresult *res;
> + int ret;
> +
> + res = PQexec(conn, "SELECT pg_catalog.pg_is_in_recovery()");
> +
> + if (PQresultStatus(res) != PGRES_TUPLES_OK)
> + {
> + PQclear(res);
> + pg_log_error("could not obtain recovery progress");
> + return -1;
> + }
> +
>
> You can simulate this by stopping the primary just before
> wait_for_end_recovery and you will see these error messages, but
> pg_createsubscriber will continue to wait:
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress
> pg_createsubscriber: error: could not obtain recovery progress

Yeah, v22-0001 cannot detect the disconnection from primary and standby.
V22-0007 can detect the standby crash, but v22 set could not detect the
primary crash. Euler came up with an approach [2] for it but not implemented yet.

[1]: https://www.postgresql.org/message-id/TYCPR01MB12077756323B79042F29DDAEDF54C2%40TYCPR01MB12077.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/2231a04b-f2d4-4a4e-b5cd-56be8b002427%40app.fastmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-20 10:18:59 Re: Add bump memory context type and use it for tuplesorts
Previous Message Quan Zongliang 2024-02-20 10:11:55 Re: Change the bool member of the Query structure to bits