Re: Hot Standby dev build (v8)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby dev build (v8)
Date: 2009-01-19 10:08:48
Message-ID: 1232359728.2327.6.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 2009-01-19 at 09:16 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote:
> >
> >>>> RecentGlobalXmin is just a hint, it lags behind the real oldest xmin
> >>>> that GetOldestXmin() would return. If another backend has a more recent
> >>>> RecentGlobalXmin value, and has killed more recent tuples on the page,
> >>>> the latestRemovedXid written here is too old.
> >>> What do you think we should do instead?
> >> Dunno. Maybe call GetOldestXmin().
> >
> > We are discussing btree deletes, not btree vacuums.
>
> Pardon my ignorance, but what's the difference?

In terms of current HEAD, not much. In terms of Hot Standby, a
significant difference - the two actions have been split, rather than
continuing to share the same WAL record.

XLOG_BTREE_VACUUM removes index tuples as a result of a vacuum. The
initial scan of the heap already generated an XLOG_HEAP2_CLEANUP_INFO
which gives the latestRemovedXid for that vacuum. So we don't need to
worry about putting a latestRemovedXid on XLOG_BTREE_VACUUM. The WAL
records also differ because the XLOG_BTREE_VACUUM contains details of
blocks that need to be pinned but not otherwise touched.

XLOG_BTREE_DELETE is different in 3 ways. It isn't part of a vacuum, so:
* we don't need to take a cleanup lock
* it doesn't contain info about other blocks we need to scan beforehand
for correctness purposes
* it wasn't preceded by an XLOG_HEAP2_CLEANUP_INFO record, so it must
have a *correct* (even if too conservative) value for latestRemovedXid
set.

So the only time we need to set latestRemovedXid correctly is during a
normal transaction, not during a vacuum.

> > If we are doing
> > btree delete then we have an unreleased snapshot therefore we also have
> > a non-zero xmin. How can another backend have a later RecentGlobalXmin
> > or result from GetOldestXmin() than we do?
>
> Sure it can, for example:
>
> 1. Transaction 1 begins in backend A
> 2. Transaction 2 begins in backend B, xmin = 1
> 3. Transaction 1 ends
> 4. Transaction 3 begins in backend C, xmin = 2
> 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
> 6. Transaction 2 ends.
> 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2,
> RecentGlobalXmin = 2
> 8. Transaction 4 kills tuple, using its RecentGlobalxmin of 1
> 9. Transaciont 3 splits the page, emits a delete xlog record, setting
> latestRemovedXid to its RecentGlobalXmin of 2

Well, steps 7 and 8 don't make sense.

Your earlier comment was that it was possible for a WAL record to be
written with a RecentGlobalXmin that was lower than other backends
values. In step 9 the RecentGlobalXmin is *not* lower than any other
backend, it is the same.

So if there is a proof, this isn't it.

But I can't see how there can be one: Two concurrent vacuums can have
different OldestXmin values, but two concurrent transactions cannot.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-01-19 10:22:27 Re: Hot Standby dev build (v8)
Previous Message Simon Riggs 2009-01-19 09:49:07 Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory