From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Joe Conway <mail(at)joeconway(dot)com> |
Subject: | Re: walreceiver is uninterruptible on win32 |
Date: | 2010-04-15 14:26:05 |
Message-ID: | z2g603c8f071004150726rf33e1c27vf48503eaf9630a6d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 14, 2010 at 11:13 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 14, 2010 at 11:15 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00077.php
>>>> As for the code itself, don't we need a CHECK_FOR_INTERRUPTS in there
>>>> for it to be actually useful?
>>>
>>> Since the backend version of select() receives any incoming signals
>>> on Win32, CHECK_FOR_INTERRUPTS seems not to be needed in the loop,
>>> at least in walreceiver. No? The patch doesn't put it in there, and
>>> I was able to interrupt walreceiver executing libpqrcv_PQexec() when
>>> I tested the patch on Win32.
>>
>> It will call the signal handler, yes. Normally, the signal handler
>> just sets a flag somewhere, which needs to be checked with
>> CHECK_FOR_INTERRUPTS.
>>
>> From how I read the walreceiver.c code (which I'm not that familiar
>> with), the signal handlers call ProcessWalRcvInterrupts() which in
>> turn has CHECK_FOR_INTERRUPTS in it, and this is where it ends up
>> being called.
>
> Yes. While establishing replication connection (i.e., executing
> walrcv_connect function), the SIGTERM signal handler directly calls
> ProcessWalRcvInterrupts() which does CHECK_FOR_INTERRUPTS() and
> elog(FATAL).
>
>>>> The patch also seems confused about whether it's intending to be a
>>>> general-purpose solution or not. You can maybe take some shortcuts
>>>> if it's only going to be for walreceiver, but if you're going to put
>>>> it in dblink it is *not* acceptable to take shortcuts like not
>>>> processing errors completely.
>>>
>>> The patch still takes some shortcuts since we decided to postpone
>>> the fix for dblink to 9.1 or later.
>>
>> Given those shortcuts, can't we simplify it even further like
>> attached?
>
> No, we need to repeat PQgetResult() at least two times. The first call
> of it reads the RowDescription, DataRow and CommandComplete messages
> and switches the state to PGASYNC_READY. The second one reads the
> ReadyForQuery message and switches the state to PGASYNC_IDLE. So if we
> don't repeat it, libpqrcv_PQexec() would end in a half-finished state.
>
>> (If nothing else, your code did PQclear() on an
>> uninitialized pointer - must've been pure luck that it worked)
>
> PQclear(NULL) might be called in my patch, but this is not a problem
> since PQclear() does nothing if the specified PGresult argument is NULL.
>
>> Looking at the call-sites, there are bugs now - if PQexec() returns
>> NULL, we don't deal with it. It also doesn't always free the result
>> properly. I've added checks for that.
>
> As Tom pointed out in another post, we don't need to treat the
> "result is NULL" case as special. OTOH, as you pointed out, I
> forgot calling PQclear() when the second call of libpqrcv_PQexec()
> in libpqrcv_connect() fails. I added it to the patch. Thanks!
>
>> Finally, I've updated some of the comments.
>
> Thanks a lot! I applied that improvements to the patch.
>
> I attached the revised patch.
I have to admit to finding this confusing. According to the comments:
+ /*
+ * Don't emulate the PQexec()'s behavior of returning the last
+ * result when there are many, since walreceiver never sends a
+ * query returning multiple results.
+ */
...but it looks like the code actually is implementing some sort of
loop-that-returns-the-last result.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2010-04-15 14:39:24 | Re: Streaming replication and a disk full in primary |
Previous Message | Erik Rijkers | 2010-04-15 14:05:41 | Re: testing HS/SR - invalid magic number |