Re: Stopping logical replication protocol

From: Vladimir Gordiychuk <folyga(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Stopping logical replication protocol
Date: 2016-09-25 18:16:01
Message-ID: CAFgjRd1z1s1vJAXUz8stx8uDKxcCEgnWN50AZOA=0=BDzfgFeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>It looks like you didn't import my updated patches, so I've rebased your
new patches on top of them.
Yes, i forgot it, sorry. Thanks for you fixes.

>I did't see you explain why this was removed:

- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;

I remove it, because during decode long transaction code, we always exist
from function by fast path and not receive messages from client. Now we
always include in 'for' loop and executes
/* Check for input from the client */ ProcessRepliesIfAny();

>Some of the changes to pg_recvlogical look to be copied from
>receivelog.c, most notably the functions ProcessKeepalive and
>ProcessXLogData .

During refactoring huge function I also notice It, but decide not include
additional changes in already huge patch. I only split method that do
everything to few small functions.

>I was evaluating the tests and I don't think they actually demonstrate
>that the patch works. There's nothing done to check that
>pg_recvlogical exited because of client-initiated CopyDone. While
>looking into that I found that it also never actually hits
>ProcessCopyDone(...) because libpq handles the CopyDone reply from the
>server its self and treats it as end-of-stream. So the loop in
>StreamLogicalLog will just end and ProcessCopyDone() is dead code.

The main idea was about fast stop logical replication as it available,
because if stop replication gets more them 1 seconds it takes more pain.
That's why my tests not contained check pg_reclogincal output, only time
spend on stop replication(CopyDone exchange).
Increase this timeout to 3 or 5 second hide problems that this PR try fix,
that's why I think this lines should be restore to state from previous
patch.

```
ok($spendTime <= 5, # allow extra time for slow machines
"pg_recvlogical exited promptly on signal when decoding");

ok((time() - $cancelTime) <= 3, # allow extra time for slow machines
"pg_recvlogical exited promptly on sigint when idle"
);

```

Also I do not understand why we do

$proc->pump while $proc->pumpable;

after send signal to process, why we can not just wait stop replication
slot?

2016-09-23 8:01 GMT+03:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 19 September 2016 at 07:12, Vladimir Gordiychuk <folyga(at)gmail(dot)com>
> wrote:
> > New patch in attach. 0001 and 0002 without changes.
> > 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca
> > after receive SIGINT will send CopyDone package to postgresql and wait
> from
> > server CopyDone. For backward compatible after repeat send SIGINT
> > pg_recvloginca will continue immediately without wait CopyDone from
> server.
> > 0004 patch contain regression tests on scenarios that fix 0001 and 0002
> > patches.
>
> Great.
>
> Thanks for this.
>
>
> First observation (which I'll fix in updated patch):
>
>
>
> It looks like you didn't import my updated patches, so I've rebased
> your new patches on top of them.
>
> Whitespace issues:
>
> $ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop-
> replication.patch
> Applying: Add ability for pg_recvlogical safe stop replication
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent
> with spaces.
> msgBuf + hdr_len + bytes_written,
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent
> with spaces.
> /* Backward compatible, allow force interrupt logical replication
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent
> with spaces.
> * after second SIGINT without wait CopyDone from server
> /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent
> with spaces.
> */
> warning: 4 lines add whitespace errors.
>
>
> Remember to use "git log --check" before sending patches.
>
> Also, comment style, please do
>
> /* this */
>
> or
>
> /*
> * this
> */
>
> not
>
> /* this
> */
>
>
>
>
> I did't see you explain why this was removed:
>
> - /* fast path */
> - /* Try to flush pending output to the client */
> - if (pq_flush_if_writable() != 0)
> - WalSndShutdown();
> -
> - if (!pq_is_send_pending())
> - return;
>
>
>
> I fixed a warning introduced here:
>
>
> pg_recvlogical.c: In function ‘ProcessXLogData’:
> pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations
> and code [-Wdeclaration-after-statement]
> int bytes_left = msgLength - hdr_len;
> ^
>
>
> Some of the changes to pg_recvlogical look to be copied from
> receivelog.c, most notably the functions ProcessKeepalive and
> ProcessXLogData . I thought that rather than copying them, shouldn't
> the existing ones be modified to meet your needs? But it looks like
> the issue is that a fair chunk of the rest of pg_recvlogical doesn't
> re-use code from receivelog.c either; for example, pg_recvlogical's
> sendFeedback differs from receivelog.c's sendFeedback. So
> pg_recvlogical doesn't share the globals that receivelog.c assumes are
> used. Also, what I thought were copied from receivelog.c were actually
> extracted from code previously inline in StreamLogicalLog(...) in
> pg_recvlogical.c .
>
> I'm reluctant to try to untangle this in the same patch for risk that
> it'll get stalled by issues with refactoring. The change you've
> already made is a useful cleanup without messing with unnecessary
> code.
>
> Your patch 3 does something ... odd:
>
> src/test/recovery/t/001_stream_rep.pl | 63
> ----------------------
> src/test/recovery/t/002_archiving.pl | 53
> -------------------
> src/test/recovery/t/003_recovery_targets.pl | 146
> ---------------------------------------------------
> src/test/recovery/t/004_timeline_switch.pl | 75
> --------------------------
> src/test/recovery/t/005_replay_delay.pl | 69
> ------------------------
> src/test/recovery/t/006_logical_decoding.pl | 40 --------------
> src/test/recovery/t/007_sync_rep.pl | 174
> ------------------------------------------------------------
> src/test/recovery/t/previous/001_stream_rep.pl | 63
> ++++++++++++++++++++++
> src/test/recovery/t/previous/002_archiving.pl | 53
> +++++++++++++++++++
> src/test/recovery/t/previous/003_recovery_targets.pl | 146
> +++++++++++++++++++++++++++++++++++++++++++++++++++
> src/test/recovery/t/previous/004_timeline_switch.pl | 75
> ++++++++++++++++++++++++++
> src/test/recovery/t/previous/005_replay_delay.pl | 69
> ++++++++++++++++++++++++
> src/test/recovery/t/previous/006_logical_decoding.pl | 40 ++++++++++++++
> src/test/recovery/t/previous/007_sync_rep.pl | 174
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> so I've revised it to remove that whole change, which I think you
> probably did not mean to commit.
>
> Reworded commit message.
>
> Ensured that we send feedback just before a client-initiated CopyDone
> so server knows what position we saved up to. We don't discard already
> buffered data
>
> I've modified patch 3 so that it also flushes to disk and sends
> feedback before sending CopyDone, then discards any xlog data received
> after it sends CopyDone. This is helpful in ensuring that we get
> predictable behaviour when cancelling pg_recvxlog and restarting it
> because the client and server agree on the point at which replay
> stopped.
>
> I was evaluating the tests and I don't think they actually demonstrate
> that the patch works. There's nothing done to check that
> pg_recvlogical exited because of client-initiated CopyDone. While
> looking into that I found that it also never actually hits
> ProcessCopyDone(...) because libpq handles the CopyDone reply from the
> server its self and treats it as end-of-stream. So the loop in
> StreamLogicalLog will just end and ProcessCopyDone() is dead code.
>
> Initialization of copyDoneSent and copyDoneReceived were also in the
> wrong place and would've caused issues with retry looping.
>
> I modified pg_recvlogical so that when in verbose mode it prints a
> message when it exits by client request. Then modified the tests to
> look for that message.
>
> An updated patch series is attached. Please re-test and review my
> changes. At that point I'll mark it ready to go unless someone else
> wants to take a look. I'm pretty much out of time for this anyway.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2016-09-25 18:33:07 Re: Better tracking of free space during SP-GiST index build
Previous Message David Steele 2016-09-25 17:15:25 Re: Password identifiers, protocol aging and SCRAM protocol