Re: Bug in VACUUM FULL ?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in VACUUM FULL ?
Date: 2007-03-09 21:40:46
Message-ID: 3413.1173476446@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com> writes:
> The problem mentioned before is hard to reproduce with the
> suggested change, but its not completely gone away. I have
> seen that again on CVS HEAD with the patch applied.
> I am facing another issue with VACUUM FULL. This
> problem gets reproduced with HOT very easily, but takes
> few attempts to reproduce with CVS HEAD, but it
> certainly exists.

I've been banging away on this since yesterday, and I think I've
achieved a full understanding of what's going on. There are three or
four different-looking pathologies but they all seem to arise from
the same problem: the update-chain-moving code assumes that
RECENTLY_DEAD tuples will never have update successors that are entirely
DEAD (according to HeapTupleSatisfiesVacuum). When faced with an
update chain in which that does happen, it can move the chain multiple
times, neglect to remove index entries for tuples that get truncated
away, crash on an Assert, or other interesting stuff. Here's an example
from some debug printouts I inserted into repair_frag:

chain forward branches 17/174 to 17/183 (x 1993046 1993057) RECENTLY_DEAD
chain forward branches 17/183 to 15/109 (x 1993057 1993055) RECENTLY_DEAD
chain forward branches 15/109 to 15/111 (x 1993055 1993045) DEAD
chain forward branches 15/111 to 15/114 (x 1993045 1993025) DEAD
chain forward branches 15/114 to 15/116 (x 1993025 1993096) RECENTLY_DEAD
chain forward branches 15/116 to 15/119 (x 1993096 1993107) RECENTLY_DEAD
chain forward branches 15/119 to 15/121 (x 1993107 1993120) RECENTLY_DEAD
chain forward branches 15/121 to 15/125 (x 1993120 1993121) RECENTLY_DEAD
chain forward branches 15/125 to 15/128 (x 1993121 1993122) RECENTLY_DEAD
chain forward branches 15/128 to 15/131 (x 1993122 1993092) RECENTLY_DEAD
chain forward branches 15/131 to 15/133 (x 1993092 1993145) RECENTLY_DEAD
chain forward branches 15/133 to 15/139 (x 1993145 1993182) RECENTLY_DEAD
chain forward branches 15/139 to 15/141 (x 1993182 1993183) RECENTLY_DEAD
chain forward branches 15/141 to 15/147 (x 1993183 1993155) RECENTLY_DEAD
chain forward branches 15/147 to 15/150 (x 1993155 1993167) LIVE
chain back stops at branches 15/114: xmin 1993025 < 1993050
moved branches 15/150 to 0/69; next 0/69
moved branches 15/147 to 0/70; next 0/69
moved branches 15/141 to 0/71; next 0/70
moved branches 15/139 to 0/72; next 0/71
moved branches 15/133 to 0/73; next 0/72
moved branches 15/131 to 0/74; next 0/73
moved branches 15/128 to 0/75; next 0/74
moved branches 15/125 to 0/76; next 0/75
moved branches 15/121 to 0/77; next 0/76
moved branches 15/119 to 0/78; next 0/77
moved branches 15/116 to 0/79; next 0/78
moved branches 15/114 to 0/80; next 0/79

Since TIDs 17/174 and 17/183 didn't get moved, when the repair_frag
search arrives at 17/183 it will copy this chain again, leading to
duplicate copies of the LIVE tuple at the chain end, leading to trouble.

It's not surprising that tuples could have xmax less than xmin, since
transactions can commit in orders different than they start; when using
READ COMMITTED updates it's not at all surprising that a transaction
might update rows after a later-numbered transaction does. However, in
looking at this code previously I'd assumed that the OldestXmin cutoff
could never fall between two such transactions, and so the above
scenario wouldn't happen. I'm not real sure why I thought that.
For the cases that VACUUM FULL is interested in, both XIDs mentioned
in a DEAD tuple must have committed before OldestXmin was computed, but
there doesn't seem to be a compelling reason why OldestXmin might not
have been determined by an unrelated third transaction with a number
between those two.

I believe it's the case that any update chain members appearing before
a DEAD entry must in fact also be dead (ie, not listed as live in any
active snapshot) but a test based on OldestXmin hasn't got enough
resolution to prove that they are dead.

Does anyone want to argue that this is an error in the calculation of
OldestXmin (and if so, how would you propose to fix it)? If not, I'll
set to work on fixing the chain-moving logic. I think the correct
behavior is that we shouldn't consider DEAD tuples part of a movable
chain, and so in the above example there are two separate chains to move
not one. Alternatively we could try to recognize that the older part of
the chain is really dead and removable, but that seems complicated and
likely to introduce new bugs.

I wonder whether this has any implications for HOT ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2007-03-09 21:41:10 Re: Auto creation of Partitions
Previous Message Jim Nasby 2007-03-09 21:38:10 Re: Log levels for checkpoint/bgwriter monitoring