Re: Hot standby, slot ids and stuff

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, slot ids and stuff
Date: 2009-01-09 13:08:32
Message-ID: 1231506512.18005.432.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> >> I mean the standby should stop trying to track the in progress
> >> transactions in recovery procs, and apply the WAL records like it does
> >> before the consistent state is reached.
> >
> > ...
> >
> > So, if we don't PANIC, how should we behave?
> >
> > Without full information on running-xacts we would be unable to take a
> > snapshot, so should:
> > * backends be forcibly disconnected?
> > * backends hang waiting for snapshot info to be re-available again in X
> > minutes worth of WAL time?
> > * backends throw an ERROR: unable to provide snapshot at this time,
> > DETAIL: retry your statement later.
> > ...other alternatives
> >
> > and possibly prevent new connections.
>
> All of those seem reasonable to me. The 2nd option seems nicest, "X
> minutes" should probably be controlled by max_standby_delay, after which
> you can throw an error.

Hmm, we use the recovery procs to track transactions that have
TransactionIds assigned. That means we will overflow only if we have
approach 100% write transactions at any time, or if we have more write
transactions in progress than we have max_connections on standby.

So it sounds like the overflow situation would probably be both rare
and, if it did occur, may not occur for long periods.

> If we care enough, we could also keep tracking the transactions in
> backend-private memory of the startup process, until there's enough room
> in proc array. That would make the outage shorter, because you wouldn't
> have to wait until the next running-xacts record, but only until enough
> transactions have finished that they all fit in proc array again.
>
> But whatever is the simplest, really.

The above does sound best since it would allow us to have the snapshot
hang for a short period. But at this stage of the game, more complex.

For now though, since it looks like it would happen fairly rarely, I'd
opt for the simplest: throw an ERROR.

> > If max_connections is higher on primary then the standby will *never* be
> > available for querying. Should we have multiple ERRORs depending upon
> > whether the situation is hopefully-temporary or looks-permanent?
> >
> > Don't assume I want the PANIC. That clearly needs to be revisited if we
> > change slotids.
>
> It needs to be revisited whether we change slotids or not, IMHO.
>
> Note that with slotids, you have a problem as soon as any of the slots
> that don't exist on standby are used, regardless of how many concurrent
> transactions there actually is. Without slots you only have a problem if
> you really have more than standby's max_connections concurrent
> transactions. That makes a big difference in practice.

Sometimes, but mostly people set max_connections higher because they
intend to use those extra connections. So no real advantage there
against the slotid approach :-)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-01-09 13:10:53 Re: Buffer pool statistics in Explain Analyze
Previous Message Magnus Hagander 2009-01-09 13:05:03 Re: Solve a problem of LC_TIME of windows.