Re: Hot standby and b-tree killed items

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby and b-tree killed items
Date: 2008-12-30 15:17:38
Message-ID: 1230650258.4793.1320.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote:

> I'm confused shouldn't read-only transactions on the slave just be
> hacked to not set any hint bits including lp_delete?

It seems there are multiple issues involved and I saw only the first of
these initially. I want to explicitly separate these issues so we can
discuss them more easily.

1. When we replay an XLOG_BTREE_DELETE record, we may have to
wait-then-cancel-etc other sessions.

Possibly a pain, but these records are not very common now that we have
HOT, except on certain kinds of queue table.

2. Should we ignore the LP_DEAD flag on btree rows when we are using the
index during recovery? As Heikki points out, this hint bit is not WAL
logged, but can appear in the standby as a result of full page writes.
The LP_DEAD flags will have been set using a different xmin to the one
on the standby and would cause index rows to be ignored that should have
been included in a correct MVCC answer.

So we need to either

(a) always ignore LP_DEAD flags we see when reading index during
recovery.

(b) include an additional step to clean the full page writes to remove
LP_DEAD hints from the incoming pages.

(b) is feasible, but would need to be repeated each time a new full page
arrived, so a page may need to be re-cleaned many times. Sounds like a
bad plan, so we should choose (a).

3. Should we set LP_DELETE flag on btree rows when we are using the
index during recovery? Not much point if we are ignoring them.

There is no space for an additional flag, to distinguish between primary
and standby hint bits.

Issues (2) and (3) would go away entirely if both standby and primary
always had the same xmin value as a system-wide setting. i.e. the
standby and primary are locked together at their xmins. Perhaps that was
Heikki's intention in recent suggestions?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-12-30 15:20:36 Re: Hot standby and b-tree killed items
Previous Message Simon Riggs 2008-12-30 15:05:59 LP_DELETE