Re: Vacuum dead tuples that are "between" transactions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paul Tillotson <spam1011(at)adelphia(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Vacuum dead tuples that are "between" transactions
Date: 2006-02-28 06:32:19
Message-ID: 10856.1141108339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Paul Tillotson <spam1011(at)adelphia(dot)net> writes:
> The topic of improving vacuum for use in heavy-update environments seems
> to come up frequently on the list. Has anyone weighed the costs of
> allowing VACUUM to reclaim tuples that are not older than the oldest
> transaction but are nonetheless invisible to all running transactions?
> It seems that it's not that hard....

It's not that easy either --- you are assuming that every process
advertises far more of its internal state than it actually does.

> Suppose that in the system, we have a serializable transaction with xid
> 1000 and a read committed transaction with xid 1001. Other than these
> two, the oldest running xid is 2000.

> Suppose we consider a tuple with xmin 1200 and xmax 1201. We will
> assume that xid 1201 committed before xid 2000 began to run.

> So:

> (A) This tuple is invisible to the serializable transaction, since its
> snapshot can't ever advance.

Wrong --- you can't assume that simply from the transaction numbering,
even assuming that you know that 1000 is serializable. 1000 might not
have set its snapshot until quite some time after it started. (This is
even pretty likely, if it waited for some locks before setting the
snapshot.) You'd need access to the snapshot 1000 is actually using to
be sure which "later" transactions are invisible to it.

While advertising whole snapshots (rather than just xmin) in shared
memory is at least theoretically possible, the costs of doing that seem
nontrivial to me ... and they'd have to be paid whether any savings
ensued or not.

> (B) The read committed transaction might be able to see it. However, if
> its current command started AFTER xid 1201 committed, it can't.

Another issue is that there's not just "one single snapshot" to worry
about per backend. Cursors for instance capture their own snaps.
So a backend would have to somehow keep track of the oldest live
snapshot it has internally.

> The read committed ones are a more difficult matter, but I think you can
> treat a tuple as dead if it was inserted after the read committed
> transaction started to run AND the tuple was deleted before the
> transaction's currently running command started to run.

To do that requires not just that you have access to a backend's oldest
snapshot, but that you have access to *all* its active snapshots;
because such a transient tuple might be visible in some newer snap even
though it's too new for the oldest snap. Doing that will create very
significant problems of shared memory management, as well as performance
and locking issues.

There's been some talk of distinguishing "global" and "within database"
xmin values, so that a long-lived transaction wouldn't interfere with
vacuuming tables in other databases that that xact couldn't possibly
access. That seems doable to me, but I think any finer-grained analysis
is probably going to be a net loss because of the distributed overhead
it'd impose on every transaction.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-02-28 06:38:03 Re: character encoding in StartupMessage
Previous Message Greg Stark 2006-02-28 06:18:14 Re: Dead Space Map