Re: Hot Standby and handling max_standby_delay

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby and handling max_standby_delay
Date: 2010-01-15 23:09:13
Message-ID: 1263596953.26654.40910.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2010-01-15 at 20:50 +0200, Heikki Linnakangas wrote:

> > So in either case, when we are waiting for new input we reset the timer
> > as soon as new WAL is received. The resolution/accuracy of standby_delay
> > will be no more than the time taken to replay a single file. This
> > shouldn't matter, since sane settings of max_standby_delay are either 0
> > or a number like 5-20 (seconds).
>
> That would change the meaning of max_standby_delay. Currently, it's the
> delay between *generating* and applying a WAL record, your proposal
> would change it to mean delay between receiving and applying it. That
> seems a lot less useful to me.

Remember that this proposal is about responding to your comments. You
showed that the time difference between generating and applying a WAL
record lacked useful meaning in cases where the generation was not
smooth and continuous. So, taking your earlier refutation as still
observing a problem, I definitely do redefine the meaning of
max_standby_delay. As you say "standby delay" means the difference
between receive and apply.

The bottom line here is: are you willing to dismiss your earlier
observation of difficulties? I don't think you can...

> With the current definition, I would feel pretty comfortable setting it
> to say 15 minutes, knowing that if the standby falls behind for any
> reason, as soon as the connection is re-established or
> archiving/restoring fixed, it will catch up quickly, blowing away any
> read-only queries if required. With your new definition, the standby
> would in the worst case pause for 15 minutes at every WAL file.

Yes, it does. And I know you're thinking along those lines because we
are concurrently discussing how to handle re-connection after updates.

The alternative is this: after being disconnected for 15 minutes we
reconnect. For the next X minutes the standby will be almost unusable
for queries while we catch up again.

---

So, I'm left with thinking that both of these ways are right, in
different circumstances and with different priorities.

If your priority is High Availability, then you are willing to give up
the capability for long-ish queries when that conflicts with the role of
HA server. (delay = apply - generate). If your priority is a Reporting
Server, then you are willing to give up HA capability in return for
relatively uninterrupted querying (delay = apply - receive).

Do we agree the two goals are mutually exclusive? If so, I think we need
another parameter to express those configuration goals.

Also, I think we need some ways to explicitly block recovery to allow
queries to run, and some ways to explicitly block queries so recovery
can run.

Perhaps we need a way to block new queries on a regular basis, so that
recovery gets a chance to run. Kind of time-slicing algorithm, like OS.
That way we could assign a relative priority to each.

Hmmm.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2010-01-15 23:12:47 Re: review: More frame options in window functions
Previous Message Tom Lane 2010-01-15 22:39:31 Re: bug in integration SQL parser to plpgsq