Re: Stopping logical replication protocol

From: Vladimir Gordiychuk <folyga(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Stopping logical replication protocol
Date: 2016-05-09 08:31:48
Message-ID: CAFgjRd1LgVbtH=9O9_xvKQjvUP7aRF-edxqwKfaNs9hMFW_4gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> What's your PostgreSQL community username?

gordiychuk

It seems like what you're also trying to allow interruption deeper than
> that, when we're in the middle of processing a reorder buffer commit record
> and streaming that to the client. You're introducing an is_active member
> (actually a callback, though name suggests it's a flag) in struct
> ReorderBuffer to check whether a CopyDone is received, and you're skipping
> ReorderBuffer commit processing when the callback returns false. The
> callback returns "!streamingDoneReceiving && !streamingDoneSending" i.e.
> it's false if either end has sent CopyDone. streamingDoneSending and
> streamingDoneSending are only set in ProcessRepliesIfAny, called by
> WalSndLoop and WalSndWaitForWal. So the idea is, presumably, that if we're
> waiting for WAL from XLogSendLogical we skip processing of any commit
> records and exit.
>
> That seems overcomplicated.
>
> When WalSndWaitForWAL is called
> by logical_read_xlog_page, logical_read_xlog_page can just test
> streamingDoneReceiving and streamingDoneSending. If they're set it can skip
> the page read and return -1, which will cause the xlogreader to return a
> null record to XLogSendLogical. That'll skip the decoding calls and return
> to WalSndLoop, where we'll notice it's time to exit.
>

ProcessRepliesIfAny also now executes in WalSdnWriteData. Because during
send data we should also check message from client(client can send
CopyDone, KeepAlive, Terminate).

@@ -1086,14 +1089,6 @@ WalSndWriteData(LogicalDecodingContext *ctx,
XLogRecPtr lsn, TransactionId xid,
memcpy(&ctx->out->data[1 + sizeof(int64) + sizeof(int64)],
tmpbuf.data, sizeof(int64));

- /* fast path */
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- WalSndShutdown();
-
- if (!pq_is_send_pending())
- return;
-

The main idea is that we can get CopyDone from client in the next
functions: WalSdnLoop, WalSndWaitForWal, WalSndWriteData. All of this
methods can take a long time, because WalSndWaitForWal can wait new
transaction and on not active db it can take long enough, WalSndWriteData
can send big transaction that also lead to ignore messages from client
until long time(In my example above for 1 million object changes, walsender
ignore messages 13 seconds and not allow reuse connection). When client
send CopyDone they don't want receive message anymore for current LSN. For
example physical replication can be interrupt in the middle of transaction
that affect more than one LSN.

Maybe I not correct undestand documentation, but I want reuse same
connection without reopen it, because open new connection takes too long.
Is it correct use case or CopyDOne it side effect of copy protocol and for
complete replication need use always Terminate package and reopen
connection?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-05-09 11:21:08 Re: Declarative partitioning
Previous Message Michael Paquier 2016-05-09 07:17:21 Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission denied”