Re: GetOldestXmin going backwards is dangerous after all

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GetOldestXmin going backwards is dangerous after all
Date: 2013-02-02 18:38:09
Message-ID: 20130202183809.GA28016@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-02-02 18:32:44 +0000, Simon Riggs wrote:
> On 2 February 2013 14:24, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > b) We don't assign the xmin early enough, we only set it when the first
> > feedback message arrives, but we should set it when walsender starts
> > streaming.
>
> That's easy to fix.

Not trivial, but not too hard, yes. When the standby initially connects
we don't yet know which xid will be required because consistency hasn't
yet been achieved.

> > c) After a disconnect the feedback message will rather likely ask for an
> > xmin horizon thats not valid anymore on the primary. If the disconnect
> > was short enough often enough that doesn't matter because nothing has
> > been cleared out, but it doesn't really work all that well.
> > Thats still better than setting it to the currently valid minimal xmin
> > horizon because it prevents cleanup from that moment on.
> > I don't see how this can be significantly improved without persistent
> > knowledge about standbys.
>
> We could delay startup of the standby until the xmin on the standby
> reaches the xmin on the master.
>
> So when the standby has hot_standby_feedback = on, at standby
> connection we set the xmin of the walsender to be the current value on
> the master, then we disallow connections on standby until we have
> replayed up to that xmin on the standby. That way the xmin of the
> walsender never goes backwards nor do we get cancelations on the
> standby.

Thats easy enough when the standby is initially started but doesn't work
that well if just the connection between both failed (or the master was
restarted) and a reconnect worked. To make it work similarly in that
case we would have to throw everyone out which would kind of counteract
the whole idea.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-02 18:41:12 Re: Cascading replication: should we detect/prevent cycles?
Previous Message Simon Riggs 2013-02-02 18:35:20 Re: GetOldestXmin going backwards is dangerous after all