Re: Vacuum Daemon

From: "J(dot) R(dot) Nield" <jrnield(at)usol(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, PostgreSQL Hacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum Daemon
Date: 2002-06-30 04:01:51
Message-ID: 1025409717.11048.63.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2002-06-29 at 21:55, Tom Lane wrote:
> "J. R. Nield" <jrnield(at)usol(dot)com> writes:
> >> I do not think that is the case; and anyway we've pretty much rejected
> >> Vadim's notion of going to an Oracle-style UNDO buffer.
>
> > Could someone point me to this discussion, or summarize what the problem
> > was?
>
> I'm too lazy to dig through the archives at the moment, but the main
> points were (a) a finite-size UNDO buffer chokes on large transactions
>

Yes this is a good point. Oracle was always lame with its "ROLLBACK
SEGMENTS". SolidDB (SolidWorks? It's been a while...) used a btree-like
structure for this that was not of fixed size. Oracle supposedly moved
to the same method in its 9i release, but I don't know the details.

I could never figure out how they did this, until I realized that UNDO
doesn't need to be in the WAL log. You just use any access method you
feel like, and make sure the method is itself protected by REDO. Just
insert REDO entries to protect the insert into the UNDO AM, and REDO log
when you delete. That makes it easy to have the recovery code be
idempotent, to catch the case of a system crash during recovery.

> and (b) the Oracle approach requires live transaction processing to
> do the cleanup work that our approach can push off to hopefully-not-
> time-critical vacuum processing.

I'm not sure which way I'm leaning on this. On the one hand, it requires
extra work to clean up while the system is live, in addition to writing
the undo records, though the cleanup is not necessarily by the same
transaction that committed the work (the cleanup needs to be deferred
until it's out of an active snapshot anyway).

On the other hand, you can clean-up without a full table scan, because
you know which tuples need to be changed. This can be a big advantage on
gigantic tables. Also, it lets you remove deleted tuples quickly, so the
space can be reused, and eliminates the xid wraparound problem.

Of course, any kind of undo is worse for performance with bulk
inserts/updates, so you either end up committing every few thousand
inserts, or you use some special extension to disable undo logging for a
bulk load (or if you really want to be able to roll it back, you live
with it :-)

How slow is it to vacuum a >1 TB database with postgres? Do we even have
any users who could test this?

Also, I would never advocate that we do what I'm pretty sure Oracle
does, and keep old values in the "Rollback Segment". Only (RelFileNode,
ItemDataPointer) addresses would need to be kept in the UNDO AM, if we
went this route.

>
> UNDO per se doesn't eliminate VACUUM anyhow; it only reclaims space
> from tuples written by aborted transactions. If you want to get rid
> of VACUUM then you need another way to get rid of the old versions of
> perfectly good committed tuples that are obsoleted by updates from
> later transactions. That essentially means you need an overwriting
> storage manager, which is a concept that doesn't mix well with MVCC.

Well, you can keep the UNDO records after commit to do a fast
incremental vacuum as soon as the transaction that deleted the tuples
becomes older than the oldest snapshot. If this is always done whenever
an XID becomes that old, then you never need to vacuum, and you never
need a full table scan.

Because postgres never overwrites (except at vacuum), I think it
actually makes us a BETTER candidate for this to be implemented cleanly
then with an overwriting storage manager. We will never need to keep
tuple values in UNDO!

>
> Oracle found a solution to that conundrum, but it's really not obvious
> to me that their solution is better than ours.

Their approach was worse, because they had an overwriting storage
manager before they tried to implement it (I'm guessing). :-)

> Also, they have
> patents that we'd probably run afoul of if we try to imitate their
> approach too closely.
>

Given the current state of affairs here in the US, PostgreSQL probably
violates hundreds or even thousands of software patents. It probably
violates tens of patents that have been upheld in court. The only thing
keeping companies from shutting down postgres, linux, OpenOffice, and a
hundred other projects is fear of adverse publicity, and the fact that
development would move overseas and continue to be a thorn in their
side.

We'll see how long this lasts, given the fear some vendors have of
certain maturing open-source/GPL projects, but I don't think PostgreSQL
will be first, since anyone can take this code and become an instant
proprietary database vendor! (No, I'm not complaining. Please, nobody
start a license fight because of this)

--
J. R. Nield
jrnield(at)usol(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2002-06-30 05:29:44 Re: Vacuum Daemon
Previous Message Bruce Momjian 2002-06-30 02:12:41 Re: Vacuum Daemon