RE: Exit walsender before confirming remote flush in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Exit walsender before confirming remote flush in logical replication
Date: 2023-01-16 11:08:30
Message-ID: TYAPR01MB5866027B7A1FA4D07E6DC813F5C19@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Horiguchi-san, Amit,

> At Fri, 13 Jan 2023 16:41:08 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote in
> > Okay, but what happens in the case of physical replication when
> > synchronous_commit = remote_apply? In that case, won't it ensure that
> > apply has also happened? If so, then shouldn't the time delay feature
> > also cause a similar problem for physical replication as well?
>
> As written in another mail, WalSndDone doesn't honor
> synchronous_commit. In other words, AFAIS walsender finishes not
> waiting remote_apply. The unapplied recods will be applied at the
> next startup.
>
> I didn't confirmed that behavior for myself, though..

If Amit wanted to say about the case that sending data is pending in physical
replication, the walsender cannot stop. But this is not related with the
synchronous_commit: it is caused because it must sweep all pending data before
shutting down. We can reproduce the situation with:

1. build streaming replication system
2. kill -STOP $walreceiver
3. insert data to primary server
4. try to stop the primary server

If what you said was not related with pending data, walsender can be stopped even
if the synchronous_commit = remote_apply. As Horiguchi-san said, such a condition
is not written in WalSndDone() [1]. I think the parameter synchronous_commit does
not affect walsender process so well. It just define when backend returns the
result to client.

I could check by following steps:

1. built streaming replication system. PSA the script to follow that.

Primary config.

```
synchronous_commit = 'remote_apply'
synchronous_standby_names = 'secondary'
```

Secondary config.

```
recovery_min_apply_delay = 1d
primary_conninfo = 'user=postgres port=$port_N1 application_name=secondary'
hot_standby = on
```

2. inserted data to primary. This waited the remote apply

psql -U postgres -p $port_primary -c "INSERT INTO tbl SELECT generate_series(1, 5000)"

3. Stopped the primary server from another terminal. It could be done.
The terminal on step2 said like:

```
WARNING: canceling the wait for synchronous replication and terminating connection due to administrator command
DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
```

[1]: https://github.com/postgres/postgres/blob/master/src/backend/replication/walsender.c#L3121

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_phy.sh application/octet-stream 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-01-16 11:09:24 RE: Exit walsender before confirming remote flush in logical replication
Previous Message Peter Eisentraut 2023-01-16 11:01:47 Re: Refactor recordExtObjInitPriv()