Re: Keepalive for max_standby_delay

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Keepalive for max_standby_delay
Date: 2010-06-02 19:41:49
Message-ID: 4C06B3FD.4090709@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/06/10 20:14, Tom Lane wrote:
> For realistic values of max_standby_delay ...

Hang on right there. What do you consider a realistic value for
max_standby_delay? Because I'm not sure I have a grip on that myself. 5
seconds? 5 minutes? 5 hours? I can see use cases for all of those...

> What I think might be a realistic compromise is this:
>
> 1. Separate max_standby_delay into two GUCs, say "max_streaming_delay"
> and "max_archive_delay".
>
> 2. When applying WAL that came across SR, use max_streaming_delay and
> let the time measurement be current time minus time of receipt of the
> current WAL send chunk.
>
> 3. When applying WAL that came from archive, use max_archive_delay and
> let the time measurement be current time minus time of acquisition of
> the current WAL segment from the archive.
>
> The current code's behavior in the latter case could effectively be
> modeled by setting max_archive_delay to zero, but that isn't the only
> plausible setting. More likely DBAs would set max_archive_delay to
> something smaller than max_streaming_delay, but still positive so as to
> not kill conflicting queries instantly.

The problem with defining max_archive_delay that way is again that you
can fall behind indefinitely. If you set it to 5 minutes, it means that
you'll wait a maximum of 5 minutes *per WAL segment*, even if WAL is
being generated faster.

I don't understand why you want to use a different delay when you're
restoring from archive vs. when you're streaming (what about existing
WAL files found in pg_xlog, BTW?). The source of WAL shouldn't make a
difference. If it's because you assume that restoring from archive is a
sign that you've fallen behind a lot, surely you've exceeded
max_standby_delay then and I still don't see a need for a separate GUC.

I stand by my suggestion from yesterday: Let's define max_standby_delay
as the difference between a piece of WAL becoming available in the
standby, and applying it. To approximate "piece of WAL becoming
available" for SR, we can use the mechanism with send/applyChunks from
Simon's latest patch, or go with the simpler scheme of just resetting a
"last caughtup timestamp" to current time whenever we have to wait for
new WAL to arrive. When restoring from archive, likewise reset "last
caughtup timestamp" whenever restore_command returns non-0, i.e we have
to wait for the next WAL file to arrive.

That works the same for both SR and file-based log shipping, there's
only one knob to set, is simple to implement and doesn't require
synchronized clocks.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-06-02 19:46:22 Re: How to pass around collation information
Previous Message Dimitri Fontaine 2010-06-02 19:40:21 Re: "caught_up" status in walsender