Re: Hot standby, slot ids and stuff

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, slot ids and stuff
Date: 2009-01-09 12:38:05
Message-ID: 4967452D.3050108@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
>> I mean the standby should stop trying to track the in progress
>> transactions in recovery procs, and apply the WAL records like it does
>> before the consistent state is reached.
>
> ...
>
> So, if we don't PANIC, how should we behave?
>
> Without full information on running-xacts we would be unable to take a
> snapshot, so should:
> * backends be forcibly disconnected?
> * backends hang waiting for snapshot info to be re-available again in X
> minutes worth of WAL time?
> * backends throw an ERROR: unable to provide snapshot at this time,
> DETAIL: retry your statement later.
> ...other alternatives
>
> and possibly prevent new connections.

All of those seem reasonable to me. The 2nd option seems nicest, "X
minutes" should probably be controlled by max_standby_delay, after which
you can throw an error.

If we care enough, we could also keep tracking the transactions in
backend-private memory of the startup process, until there's enough room
in proc array. That would make the outage shorter, because you wouldn't
have to wait until the next running-xacts record, but only until enough
transactions have finished that they all fit in proc array again.

But whatever is the simplest, really.

> If max_connections is higher on primary then the standby will *never* be
> available for querying. Should we have multiple ERRORs depending upon
> whether the situation is hopefully-temporary or looks-permanent?
>
> Don't assume I want the PANIC. That clearly needs to be revisited if we
> change slotids.

It needs to be revisited whether we change slotids or not, IMHO.

Note that with slotids, you have a problem as soon as any of the slots
that don't exist on standby are used, regardless of how many concurrent
transactions there actually is. Without slots you only have a problem if
you really have more than standby's max_connections concurrent
transactions. That makes a big difference in practice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2009-01-09 13:05:03 Re: Solve a problem of LC_TIME of windows.
Previous Message Simon Riggs 2009-01-09 12:21:39 Re: Hot standby, slot ids and stuff