From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Subject: | RE: Exit walsender before confirming remote flush in logical replication |
Date: | 2022-12-23 12:54:15 |
Message-ID: | TYAPR01MB5866CCD2C21790FEBE944034F5E99@TYAPR01MB5866.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Horiguchi-san,
> Thus how about before entering an apply_delay, logrep worker sending a
> kind of crafted feedback, which reports commit_data.end_lsn as
> flushpos? A little tweak is needed in send_feedback() but seems to
> work..
Thanks for replying! I tested your saying but it could not work well...
I made PoC based on the latest time-delayed patches [1] for non-streaming case.
Apply workers that are delaying applications send begin_data.final_lsn as recvpos and flushpos in send_feedback().
Followings were contents of the feedback message I got, and we could see that recv and flush were overwritten.
```
DEBUG: sending feedback (force 1) to recv 0/1553638, write 0/1553550, flush 0/1553638
CONTEXT: processing remote data for replication origin "pg_16390" during message type "BEGIN" in transaction 730, finished at 0/1553638
```
In terms of walsender, however, sentPtr seemed to be slightly larger than flushed position on subscriber.
```
(gdb) p MyWalSnd->sentPtr
$2 = 22361760
(gdb) p MyWalSnd->flush
$3 = 22361656
(gdb) p *MyWalSnd
$4 = {pid = 28807, state = WALSNDSTATE_STREAMING, sentPtr = 22361760, needreload = false, write = 22361656,
flush = 22361656, apply = 22361424, writeLag = 20020343, flushLag = 20020343, applyLag = 20020343,
sync_standby_priority = 0, mutex = 0 '\000', latch = 0x7ff0350cbb94, replyTime = 725113263592095}
```
Therefore I could not shut down the publisher node when applications were delaying.
Do you have any opinions about them?
```
$ pg_ctl stop -D data_pub/
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
```
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2022-12-23 12:59:13 | Re: daitch_mokotoff module |
Previous Message | David Rowley | 2022-12-23 12:10:31 | Re: Avoid lost result of recursion (src/backend/optimizer/util/inherit.c) |