On Wed, Feb 1, 2012 at 11:33 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> So freezing multixacts is not all that easy. I mean, you just scan the
> page looking for multis lesser than the cutoff; for those that are dead,
> they can just be removed completely, but what about ones that still have
> members running? This is pretty unlikely but not impossible.
> If there's only one remaining member, the problem is easy: replace it
> with that transaction's xid, and set the appropriate hint bits. But if
> there's more than one, the only way out is to create a new multi. This
> increases multixactid consumption, but I don't see any other option.
Why do we need to freeze anything if the transactions are still
running? We certainly don't freeze regular transaction IDs while the
transactions are still running; it would give wrong answers. It's
probably possible to do it for mxids, but why would you need to?
Suppose you have a tuple A which is locked by a series of transactions
T0, T1, T2, ...; AIUI, each new locker is going to have to create a
new mxid with all the existing entries plus a new one for itself.
But, unless I'm confused, as it's doing so, it can discard any entries
for locks taken by transactions which are no longer running. So given
an mxid with living members, any dead member in that mxid must have
been living at the time the newest member was added. Surely we can't
be consuming mxids anywhere near fast enough for that to be a problem.
There could be an updating transaction involved as well, but if
that's not running any more then it has either committed (in which
case the tuple will be dead once the global-xmin advances past it) or
aborted (in which case we can forget about it).
> However, there are cases where not even that is possible -- consider
> tuple freezing during WAL recovery. Recovery is going to need to
> replace those multis with other multis, but it cannot create new multis
> itself. The only solution here appears to be that when multis are
> frozen in the master, replacement multis have to be logged too. So the
> heap_freeze_tuple Xlog record will have a map of old multi to new. That
> way, recovery can just determine the new multi to use for any particular
> old multi; since multixact creation is also logged, we're certain that
> the replacement value has already been defined.
This doesn't sound right. Why would recovery need to create a multi
that didn't exist on the master? Any multi it applies to a record
should be one that it was told to apply by the master; and the master
should have already WAL-logged the creation of that multi. I don't
think that "replacement" mxids have to be logged; I think that *all*
mxids have to be logged. Am I all wet?
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-02-02 14:30:48|
|Subject: Re: heap_tuple_needs_freeze false positive|
|Previous:||From: firstname.lastname@example.org||Date: 2012-02-02 14:19:59|
|Subject: Re: Progress on fast path sorting, btree index creation