From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot Standby dev build (v8) |
Date: | 2009-01-19 10:08:48 |
Message-ID: | 1232359728.2327.6.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 2009-01-19 at 09:16 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote:
> >
> >>>> RecentGlobalXmin is just a hint, it lags behind the real oldest xmin
> >>>> that GetOldestXmin() would return. If another backend has a more recent
> >>>> RecentGlobalXmin value, and has killed more recent tuples on the page,
> >>>> the latestRemovedXid written here is too old.
> >>> What do you think we should do instead?
> >> Dunno. Maybe call GetOldestXmin().
> >
> > We are discussing btree deletes, not btree vacuums.
>
> Pardon my ignorance, but what's the difference?
In terms of current HEAD, not much. In terms of Hot Standby, a
significant difference - the two actions have been split, rather than
continuing to share the same WAL record.
XLOG_BTREE_VACUUM removes index tuples as a result of a vacuum. The
initial scan of the heap already generated an XLOG_HEAP2_CLEANUP_INFO
which gives the latestRemovedXid for that vacuum. So we don't need to
worry about putting a latestRemovedXid on XLOG_BTREE_VACUUM. The WAL
records also differ because the XLOG_BTREE_VACUUM contains details of
blocks that need to be pinned but not otherwise touched.
XLOG_BTREE_DELETE is different in 3 ways. It isn't part of a vacuum, so:
* we don't need to take a cleanup lock
* it doesn't contain info about other blocks we need to scan beforehand
for correctness purposes
* it wasn't preceded by an XLOG_HEAP2_CLEANUP_INFO record, so it must
have a *correct* (even if too conservative) value for latestRemovedXid
set.
So the only time we need to set latestRemovedXid correctly is during a
normal transaction, not during a vacuum.
> > If we are doing
> > btree delete then we have an unreleased snapshot therefore we also have
> > a non-zero xmin. How can another backend have a later RecentGlobalXmin
> > or result from GetOldestXmin() than we do?
>
> Sure it can, for example:
>
> 1. Transaction 1 begins in backend A
> 2. Transaction 2 begins in backend B, xmin = 1
> 3. Transaction 1 ends
> 4. Transaction 3 begins in backend C, xmin = 2
> 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
> 6. Transaction 2 ends.
> 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2,
> RecentGlobalXmin = 2
> 8. Transaction 4 kills tuple, using its RecentGlobalxmin of 1
> 9. Transaciont 3 splits the page, emits a delete xlog record, setting
> latestRemovedXid to its RecentGlobalXmin of 2
Well, steps 7 and 8 don't make sense.
Your earlier comment was that it was possible for a WAL record to be
written with a RecentGlobalXmin that was lower than other backends
values. In step 9 the RecentGlobalXmin is *not* lower than any other
backend, it is the same.
So if there is a proof, this isn't it.
But I can't see how there can be one: Two concurrent vacuums can have
different OldestXmin values, but two concurrent transactions cannot.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2009-01-19 10:22:27 | Re: Hot Standby dev build (v8) |
Previous Message | Simon Riggs | 2009-01-19 09:49:07 | Re: FATAL: could not open relation pg_tblspc/491086/467369/491103: No such file or directory |