Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Date: 2011-10-10 19:25:31
Message-ID: CABUevEwU6v040JdmXg_j=iHrujqQMr1i9Ey_dQvjozNw2+4EYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 6, 2011 at 23:46, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Oct6, 2011, at 21:48 , Magnus Hagander wrote:
>>> The question is, should we do more? To me, it'd make sense to terminate
>>> a backend once it's connection is gone. We could, for example, make
>>> pq_flush() set a global flag, and make CHECK_FOR_INTERRUPTS handle a
>>> broken connection that same way as a SIGINT or SIGTERM.
>>
>> The problem here is that we're hanging at a place where we don't touch
>> the socket. So we won't notice the socket is gone. We'd have to do a
>> select() or something like that at regular intervals to make sure it's
>> there, no?
>
> We do emit NOTICEs saying "pg_stop_backup still waiting ... " repeatedly,
> so we should notice a dead connection sooner or later. When I tried, I even
> got a message in the log complaining about the "broken pipe".

Ah, good point, that should be doable. Forgot about that message...

> As it stands, the interval between two NOTICEs grows exponentially - we
> send the first after waiting 5 second, the next after waiting 60 seconds,
> and then after waiting 120, 240, 480, ... seconds. This means that that the
> backend would in the worst case linger the same amount of time *after*
> pg_basebackup was cancelled that pg_basebackup waited for *before* it was
> cancelled.
>
> It'd be nice to generally terminate a backend if the client vanishes, but so
> far I haven't had any bright ideas. Using FASYNC and F_SETOWN unfortunately
> sends a signal *everytime* the fd becomes readable or writeable, not only on
> EOF. Doing select() in CHECK_FOR_INTERRUPTS seems far too expensive. We could
> make the postmaster keep the fd's of around even after forking a backend, and
> make it watch for broken connections using select(). But with a large max_backends
> settings, we'd risk running out of fds in the postmaster...

Ugh. Yeah. But at least catching it and terminating it when we *do*
notice it's down would certainly make sense...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Shulgin 2011-10-10 19:27:21 Re: Should we get rid of custom_variable_classes altogether?
Previous Message Kevin Grittner 2011-10-10 19:18:23 Re: COUNT(*) and index-only scans