Re: Thoughts on maintaining 7.3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Thoughts on maintaining 7.3
Date: 2003-10-05 03:53:49
Message-ID: 27612.1065326029@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote:
>> No. You'd be better off using REINDEX for that, I think. IIRC we have
>> speculated about making VAC FULL fix the indexes via REINDEX rather than
>> indexbulkdelete.

> I can't agree with that idea.

Why not? There is plenty of anecdotal evidence in the archives saying
that it's faster to drop indexes, VACUUM FULL, recreate indexes than
to VACUUM FULL with indexes in place. Most of those reports date from
before we had the lazy-vacuum alternative, but I don't think that
renders them less relevant.

> Imagine having to VACUUM FULL a huge
> table. Not only it will take the lot required to do the VACUUM in the
> heap itself, it will also have to rebuild all indexes from scratch.

A very large chunk of VACUUM FULL's runtime is spent fooling with the
indexes. Have you looked at the code in any detail? It goes like this:

1. Scan heap looking for dead tuples and free space.

2. Make a pass over the indexes to delete index entries for dead tuples.

3. Copy remaining live tuples to lower-numbered pages to compact heap.
3a. Every time we copy a tuple, make new index entries pointing to its
new location. (The old index entries still remain, though.)

4. Commit transaction so that new copies of moved tuples are good and
old ones are not.

5. Make a pass over the indexes to delete index entries for old copies
of moved tuples.

When there are only a few tuples being moved, this isn't too bad of a
strategy. But when there are lots, steps 2, 3a, and 5 represent a huge
amount of work. What's worse, step 3a swells the index well beyond its
final size. This used to mean permanent index bloat. Nowadays step 5
will be able to recover some of that space --- but not at zero cost.

I think it's entirely plausible that dropping steps 2, 3a, and 5 in
favor of an index rebuild at the end could be a winner.

> I think there are scenarios where the REINDEX will be much worse, say when
> there are not too many deleted tuples (but in that case, why is the user
> doing VACUUM FULL in the first place?).

Yeah, I think that's exactly the important point. These days there's
not a lot of reason to do VACUUM FULL unless you have a major amount of
restructuring to do. I would once have favored maintaining two code
paths with two strategies, but now I doubt it's worth the trouble.
(Or I should say, we have two code paths, the other being lazy VACUUM
--- do we need three?)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-10-05 03:57:33 Re: Thoughts on maintaining 7.3
Previous Message Alvaro Herrera 2003-10-05 03:26:56 Re: Thoughts on maintaining 7.3