RE: [PoC] pg_upgrade: allow to upgrade publisher node

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node
Date: 2023-08-07 10:53:19
Message-ID: TYAPR01MB5866455A1654D621571DBD24F50CA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Julien,

> > > >
> > > > Unless I'm missing something I don't see what prevents something to
> connect
> > > > using the replication protocol and issue any query or even create new
> > > > replication slots?
> > > >
> > >
> > > I think the point is that if we have any slots where we have not
> > > consumed the pending WAL (other than the expected like
> > > SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> > > won't proceed and we will request user to remove such slots or ensure
> > > that WAL is consumed by slots. So, I think in the case you mentioned,
> > > the upgrade won't succeed.
> >
> > What if new slots are added while the old instance is started in the middle of
> > pg_upgrade, *after* the various checks are done?
> >
>
> They won't be copied but I think that won't be any different than
> other objects like tables. Anyway, I have another idea which is to not
> allow creating slots during binary upgrade unless one specifically
> requests it by having an API like binary_upgrade_allow_slot_create()
> similar to existing APIs binary_upgrade_*.

I confirmed the part and confirmed that objects created after the dump
were not copied to new node. PSA scripts to emulate my test.

# tested steps

-1. applied v18 patch set
0. modified source to create objects during upgrade and install:

```
@@ -188,6 +188,9 @@ check_and_dump_old_cluster(bool live_check)
if (!user_opts.check)
generate_old_dump();

+ printf("XXX: start to sleep\n");
+ sleep(35);
+
```

1. prepared a node which had a replication slot
2. did pg_upgrade, the process will sleep 35 seconds during that
3. connected to the in-upgrading node by the command:

```
psql "host=`pwd` user=postgres port=50432 replication=database"
```

4. created a table and replication slot. Note that for binary upgrade, it was very
hard to create tables manually. For me, table "bar" and slot "test" were created.
5. waited until the upgrade and boot new node.
6. confirmed that created tables and slots were not found on new node.

```
new_publisher=# \d
Did not find any relations.

new_publisher=# SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'test';
slot_name
-----------
(0 rows)
```

You can execute test_01.sh first, and then execute test_02.sh while the first terminal is stuck.

Note that such creations are theoretically occurred, but it is very rare.
By followings line in start_postmaster(), the TCP/IP connections are refused and
only the superuser can connect to the server.

```
#if !defined(WIN32)
/* prevent TCP/IP connections, restrict socket access */
strcat(socket_string,
" -c listen_addresses='' -c unix_socket_permissions=0700");

/* Have a sockdir? Tell the postmaster. */
if (cluster->sockdir)
snprintf(socket_string + strlen(socket_string),
sizeof(socket_string) - strlen(socket_string),
" -c %s='%s'",
(GET_MAJOR_VERSION(cluster->major_version) <= 902) ?
"unix_socket_directory" : "unix_socket_directories",
cluster->sockdir);
#endif
```

Moreover, the socket directory is set to current dir of caller, and port number
is also different from setting written in postgresql.conf.
I think there are few chances that replication slots are accidentally created
during the replication slot.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_01.sh application/octet-stream 1.1 KB
test_02.sh application/octet-stream 744 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2023-08-07 10:57:40 Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Previous Message Tomas Vondra 2023-08-07 10:51:24 Re: Configurable FP_LOCK_SLOTS_PER_BACKEND