Minimally avoiding Transaction Wraparound in VLDBs

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Minimally avoiding Transaction Wraparound in VLDBs
Date: 2005-08-31 22:17:44
Message-ID: 1125526664.3956.62.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


For a while now, I've been seeking a method of reducing the impact of
VACUUM when run against a database where 80-95% of the data is
effectively read only and will not be deleted/updated again. This is the
situation in most Data Warehouses. When you get very large databases
(VLDB) the execution time of VACUUM becomes prohibitive.

I understand the need to run some form of VACUUM to avoid transaction id
wraparound, but I see that VACUUM does a lot of other things too.

A new command is proposed - VACUUM MINIMAL. The *sole* purpose of this
command is to do the absolute minimum required to avoid transaction id
wraparound. (Better names welcome....)

This does the same thing as VACUUM except it:

1. does not VACUUM any table younger than 4 billion XIDs old
RATIONALE: If a table was created less than 4 billion XIDs ago, it
clearly can't have any tuples in it with an XID more than 4 billion XIDs
old, so we don't need to VACUUM it to avoid XID wraparound.
(Current VACUUM will scan everything, even if a table was created only a
few transactions ago).

2. does not VACUUM indexes
RATIONALE: Indexes don't contain XIDs so don't *need* to be vacuumed.

By taking those two steps, VACUUM MINIMAL will execute fairly quickly
even on large Data Warehouses. Those steps are fairly easy to implement
without change to the basic structure of VACUUM.

This command only makes sense when you *know* that tables don't need
vacuuming. That is the case when:
- you're using autovacuum, since all tables have dead-tuples removed
whenever this is required - and so indexes will have been covered also
- you are using table partitioning and the data retention period of your
data is less than 4 billion transactions. At 100 requests/sec that is a
whole year of data - and if you are using COPY to load the data, then
that comes out at hundreds of billions of rows, or Terabytes of data.
(Which is also the rationale, since you really *don't* want to VACUUM a
Terabyte of tables with indexes on them, ever).

The limit is set at 4 billion because with this command we are trying to
avoid doing work as long as possible. This makes the command faster,
which in turn allows the command to be run more regularly, probably
daily. Of course, you would get a somewhat longer running command once
table XIDs have been frozen but this is for the user to understand and
avoid, if they have problems with that.

Thus, the user has a choice of two ways of avoiding XID wraparound:
- VACUUM
- VACUUM MINIMAL
Each with their specific strengths and weaknesses.

We've discussed in the past the idea that VACUUM can be speeded-up by
using a bitmap to track which blocks to clean. That's a good idea and I
see that as somewhat orthogonal to the reason for this proposal. To be
of use in the circumstances I'm trying to optimise for, the vacuum
bitmaps would need to be non-lossy, persistent and recoverable to be of
use for xid wraparound use (I know the clog code could be used for
that), as well as only utilised for tables bigger than a certain
threshold, say 128 heap blocks or more - to avoid having too many
bitmaps when there are 1000s of tables. They also still don't help much
with VACUUMing big indexes in a DW context, since VACUUM still makes two
passes of each index even when there are no dead rows to remove from the
the index. That could be tuned somewhat, for which I also have a design
but why bother tuning VACUUM when you can just skip it?

Comments?

Best Regards, Simon Riggs

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew - Supernews 2005-08-31 22:31:53 Re: Indexing dead tuples
Previous Message Tom Lane 2005-08-31 21:56:52 Procedural language definitions (was Re: 8.1 and syntax checking at create time)