Re: walreceiver is uninterruptible on win32

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com>
Subject: Re: walreceiver is uninterruptible on win32
Date: 2010-04-14 14:15:43
Message-ID: r2n9837222c1004140715m9bf986f0w384c3631c18d11ed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 14, 2010 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Apr 13, 2010 at 9:21 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Apr 13, 2010 at 1:56 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> If adding new shared library is too big change at this point, I think
>>>> that we should postpone the fix only for dblink to 9.1 or later. Since
>>>> no one has complained about this long-term problem of dblink, I'm not
>>>> sure it really should be fixed right now. Thought?
>>>
>>> +1. Let's fix walreceiver for now, and we can revisit dblink later.
>>> Since we haven't had any complaints so far...
>>
>> OK. I'll focus on walreceiver now.
>
> The attached patch changes walreceiver so that it uses new function
> libpqrcv_PQexec() instead of PQexec() for sending handshake message.
> libpqrcv_PQexec() sends a query and wait for the results by using
> the asynchronous libpq functions and the backend version of select().
> It's interruptible by signals. You would be able to shut the standby
> server down in the middle of handshake for replication connection.
>
> http://archives.postgresql.org/pgsql-hackers/2010-03/msg00625.php
>> Just replacing PQexec() with PQsendQuery() is pretty straightforward, we
>> could put that replacement in a file in port/win32. Replacing
>> PQconnectdb() is more complicated because you need to handle connection
>> timeout. I suggest that we only add the replacement for PQexec(), and
>> live with the situation for PQconnectdb(), that covers 99% of the
>> scenarios anyway.
>
> According to the suggestion, I replaced only the PQexec() and didn't
> add the replacement of PQconnect() for now.
>
> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>> for it to be actually useful?
>
> Since the backend version of select() receives any incoming signals
> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
> at least in walreceiver. No? The patch doesn't put it in there, and
> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
> I tested the patch on Win32.

It will call the signal handler, yes. Normally, the signal handler
just sets a flag somewhere, which needs to be checked with
CHECK_FOR_INTERRUPTS.

From how I read the walreceiver.c code (which I'm not that familiar
with), the signal handlers call ProcessWalRcvInterrupts() which in
turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
being called.

>> The patch also seems confused about whether it's intending to be a
>> general-purpose solution or not.  You can maybe take some shortcuts
>> if it's only going to be for walreceiver, but if you're going to put
>> it in dblink it is *not* acceptable to take shortcuts like not
>> processing errors completely.
>
> The patch still takes some shortcuts since we decided to postpone
> the fix for dblink to 9.1 or later.

Given those shortcuts, can't we simplify it even further like
attached? (If nothing else, your code did PQclear() on an
uninitialized pointer - must've been pure luck that it worked)

Looking at the call-sites, there are bugs now - if PQexec() returns
NULL, we don't deal with it. It also doesn't always free the result
properly. I've added checks for that.

Finally, I've updated some of the comments.

Note: I've only tested this patch as far as that it compiles. I don't
have a SR env around right now, so I'll have to complete with that
later. But if you have a chance to test that it fixes your test case,
please do!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
libpqrcv_PQexec_v2.patch application/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2010-04-14 14:16:36 standbycheck was:(Re: [HACKERS] testing hot standby
Previous Message Rusty Conover 2010-04-14 14:15:35 Re: BUG #5412: test case produced, possible race condition.