Re: beta3 & the open items list

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: beta3 & the open items list
Date: 2010-06-21 03:54:21
Message-ID: AANLkTikhcwlko4QsPonDRiYbgxdscH-9dR2IBqqmgdAT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 20, 2010 at 9:31 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Mon, Jun 21, 2010 at 12:42 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> I'd buy that if all timeouts and retry counts would default to +infinity. But they don't, and hence sufficiently long network outages *will* cause connection aborts anyway. That a particular connection might survive due to inactivity proves nothing, since whether the connection is active or inactive during an outage is usually outside of anyone's control.
>>
>> I really fail to see why anyone would prefer connections (and therefore transactions!) getting stuck forever over a few spurious disconnects. The former always require manual intervention and cause all sorts of performance and disk-space issues, while the latter won't even be an issue for well-written clients who just reconnect and retry.
>>
>
> So just as a data point I'm routinely annoyed by reopening my screen
> session and finding various session sessions have died since the day
> before. Usually this is caused by broken firewalls but there are also
> a bunch of SSH options which some servers have enabled which cause my
> sessions to never survive very long if there are any network outages.
> Servers where those options are disabled work fine.
>
> I admit this is a very different use case though and since we have
> control over the behaviour when the connection breaks perhaps the
> analogy falls apart completely. I'm not sure we can guarantee that
> reconnecting is always so simple though. What if the user set up an
> SSH gateway or needs some extra authentication to make the connection.
> Are users expecting the slave to randomly disconnect and reconnect
> willy nilly or are they expecting that once it connects it'll keep
> using that connection forever?

I feel like we're getting off in the weeds, here. Obviously, the user
would ideally like the connection to the master to last forever, but
equally obviously, if the master unexpectedly reboots, they'd like the
slave to notice - ideally within some reasonable time period - that it
needs to reconnect. There's no perfect way to distinguish "the master
croaked" from "the network administrator unplugged the Ethernet cable
and is planning to plug it back in any hour now", so we'll just need
to pick some reasonable timeout and go with it. To my way of
thinking, if the master hasn't responded in a minute or two, that's a
sign that it's time to declare the connection dead. Retrying the
connection *should* be cheap. If the user has set things up so that a
TCP connection from slave to master is not straightforward, the user
has configured it incorrectly, and no matter what we do it's not going
to be reliable.

I still think there's a decent argument that we might want to have a
protocol-level heartbeat rather than a TCP-level heartbeat. But doing
the latter is, I think, good enough for 9.0. We're pretty much
speculating about what the problems with that approach might be, so
getting too worked up about fixing them at this point seems premature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-06-21 04:07:01 Re: Patch: psql \whoami option
Previous Message Steve Singer 2010-06-21 02:51:01 Re: Patch: psql \whoami option