Re: Disabled features on Hot Standby

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Disabled features on Hot Standby
Date: 2012-01-13 17:08:05
Message-ID: CA+Tgmobq22kF9LkGLsm2QqeRVGirr1sFWkXGqb=7BVjQ7Bm9ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2012 at 11:13 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think it should be you that comes up with a fix, not for me to
> respond to your concerns about how hard it is. Many things that don't
> fully work are rejected for that reason.

Well, I disagree. The fact that all-visible info can't be trusted in
standby mode is a problem that has existed since Hot Standby was
committed, and I don't feel obliged to fix it just because I was
involved in developing a new feature that happens to rely on
all-visible info. I'm sorry to butt heads with you on this one, but
this limitation has been long-known and discussed many times before on
pgsql-hackers, and I'm not going to drop everything and start working
on this just because you seem to think that I should.

> Having said that, I have input that seems to solve the problem.
>
> Many WAL records have latestRemovedXid on them. We can use the same
> idea with XLOG_HEAP2_VISIBLE records, so we add a field to send the
> latest vacrelstats->latestRemovedXid. That then creates a recovery
> snapshot conflict that would cancel any query that might then see a
> page of the vis map that was written when the xmin was later than on
> the standby. If replication disconnects briefly and a vimap bit is
> updated that would cause a problem, just as the same situation would
> cause a problem because of other record types.

That could create a lot of recovery conflicts when
hot_standby_feedback=off, I think, but it might work when
hot_standby_feedback=on. I don't fully understand the
latestRemovedXid machinery, but I guess the idea would be to kill any
standby transaction whose proc->xmin precedes the oldest committed
xmin or xmax on the page. If hot_standby_feedback=on then there
shouldn't be any, except in the case where it's just been enabled or
the SR connection is bouncing.

Also, what happens if an all-visible bit gets set on the standby
through some other mechanism - e.g. restored from an FPI or
XLOG_HEAP_NEWPAGE? I'm not sure whether we ever do an FPI of the
visibility map page itself, but we certainly do it for the heap pages.
So it might be that this infrastructure would (somewhat bizarrely)
trust the visibility map bits but not the PD_ALL_VISIBLE bits. I'm
hoping Heikki or Tom will comment on this thread, because I think
there are a bunch of subtle issues here and that we could easily screw
it up by trying to plow through the problem too hastily.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2012-01-13 17:08:51 Re: checkpoint writeback via sync_file_range
Previous Message Josh Berkus 2012-01-13 17:07:03 Review of: explain / allow collecting row counts without timing info