RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-07 09:17:28
Message-ID: TYCPR01MB120777864BFAC2294819FC5B4F5452@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

I have not finished reviewing, but I can reply to you first.

>
I didn't complete this task yet so I didn't include it in this patch.
>

Just to confirm - No need to forcibly include my refactoring patch: you can
modify based on your idea.

>
That's correct. We made a decision to use non physical replication connections
(besides the one used to connect primary <-> standby). Hence, my concern about
a HBA rule falls apart. I'm not convinced that using a modified
primary_conninfo is the only/right answer to fill the subscription connection
string. Physical vs logical replication has different requirements when we talk
about users. The physical replication requires only the REPLICATION privilege.
On the other hand, to create a subscription you must have the privileges of
pg_create_subscription role and also CREATE privilege on the specified
database. Unless, you are always recommending the superuser for this tool, I'm
afraid there will be cases that reusing primary_conninfo will be an issue. The
more I think about this UI, the more I think that, if it is not hundreds of
lines of code, it uses the primary_conninfo the -P is not specified.
>

Valid point. It is one of the best practice that users set minimal privileges
for each accounts. However...

>
Ugh. An error will occur the first time (get_primary_sysid) it tries to connect
to primary.
>

I'm not sure it is correct. I think there are several patterns.

a) -P option specified a ghost server, i.e., pg_createsubscriber cannot connect to
anything. In this case, get_primary_sysid() would be failed because
connect_database() would be failed.

b) -P option specified an existing server, but it is not my upstream.
get_primary_sysid() would be suceeded.
In this case, there are two furher branches:

b-1) nodes have different system_identifier. pg_createsubscriber would raise
an ERROR and stop.
b-2) nodes have the same system_identifier (e.g., nodes were generated by the
same master). In this case, current checkings would be passed but
pg_createsubscriber would stuck or reach timeout. PSA the reproducer.

Can pg_createsubscriber check the relation two nodes have been connected by
replication protocol? Or, can we assume that all the node surely have different
system_identifier?

>
It is. However, I keep the assignments for 2 reasons: (a) there might be
parameters whose default value is not zero, (b) the standard does not say that
a null pointer must be represented by zero and (c) there is no harm in being
paranoid during initial assignment.
>

Hmn, so, no need to use `= {0};`, but up to you. Also, according to C99 standard[1],
NULL seemed to be defined as 0x0:
```
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.
If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
```

>
> WriteRecoveryConfig() writes GUC parameters to postgresql.auto.conf, but not
> sure it is good. These settings would remain on new subscriber even after the
> pg_createsubscriber. Can we avoid it? I come up with passing these parameters
> via pg_ctl -o option, but it requires parsing output from GenerateRecoveryConfig()
> (all GUCs must be allign like "-c XXX -c XXX -c XXX...").

I applied a modified version of v16-0006.
>

Sorry for confusing, it is not a solution. This approach writes parameter
settings to postgresql.auto.conf, and they are never removed. Overwritten
settings would remain.

>
I don't understand why v16-0002 is required. In a previous version, this patch
was using connections in logical replication mode. After some discussion we
decided to change it to regular connections and use SQL functions (instead of
replication commands). Is it a requirement for v16-0003?
>

No, it is not required by 0003. I forgot the decision that we force DBAs to open
normal connection establishments for pg_createsubscriber. Please ignore...

[1]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

Attachment Content-Type Size
test_0207.sh application/octet-stream 2.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-02-07 09:27:28 Re: Printing backtrace of postgres processes
Previous Message Mats Kindahl 2024-02-07 09:09:58 Re: glibc qsort() vulnerability