Re: Logical Replication WIP

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Erik Rijkers <er(at)xs4all(dot)nl>, Steve Singer <steve(at)ssinger(dot)info>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2017-01-20 21:30:56
Message-ID: ab78e303-777d-61c8-0b43-160f2bdaf853@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20/01/17 17:23, Petr Jelinek wrote:
> On 20/01/17 15:08, Peter Eisentraut wrote:
>> On 1/19/17 5:01 PM, Petr Jelinek wrote:
>>> There were some conflicting changes committed today so I rebased the
>>> patch on top of them.
>>>
>>> Other than that nothing much has changed, I removed the separate sync
>>> commit patch, included the rename patch in the patchset and fixed the
>>> bug around pg_subscription catalog reported by Erik Rijkers.
>>
>> Committed. I haven't reviewed the rename patch yet, so I'll get back to
>> that later.
>>
>
> Hi,
>
> Thanks!
>
> Here is fix for the dependency mess.
>

Álvaro pointed out off list couple of issues with how we handle
interruption of commands that connect to walsender.

a) The libpqwalreceiver.c does blocking connect so it's impossible to
cancel CREATE SUBSCRIPTION which is stuck on connect. This is btw
preexisting problem and applies to walreceiver as well. I rewrote the
connect function to use asynchronous API (patch 0001).

b) We can cancel in middle of the command (when stuck in
libpqrcv_PQexec) but the connection to walsender stays open which in
case we are waiting for snapshot can mean that it will stay idle in
transaction. I added PG_TRY wrapper which disconnects on error around
this (patch 0002).

And finally, while testing these two I found bug in walsender StringInfo
initialization (or lack there of). There are 3 static StringInfo buffers
that are initialized in WalSndLoop. Problem with that is that they can
be in some rare scenarios used from CreateReplicationSlot (and IMHO
StartLogicalReplication) before WalSndLoop is called which causes
segfault of walsender. This is rare because it only happens when
downstream closes connection during logical decoding initialization.

Since it's not exactly straight forward to find when these need to be
initialized based on commands, I decided to move the initialization code
to exec_replication_command() since that's always called before anything
so that makes it much less error prone (patch 0003).

The 0003 should be backpatched all the way to 9.4 where multiple
commands started using those buffers.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch application/x-patch 3.9 KB
0002-Close-replication-connection-when-slot-creation-gets.patch application/x-patch 1.3 KB
0003-Always-initialize-stringinfo-buffers-in-walsender.patch application/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-01-20 21:34:14 Re: Vacuum: allow usage of more than 1GB of work mem
Previous Message Tom Lane 2017-01-20 21:30:21 Re: Valgrind-detected bug in partitioning code