Re: Avoiding second heap scan in VACUUM

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Postgres - Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding second heap scan in VACUUM
Date: 2008-05-30 08:52:36
Message-ID: 2e78013d0805300152x705e3dfbm3c2b353481dd33bd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 30, 2008 at 1:56 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>
>
> Been thinking some more about this. You're right that the second scan
> could re-dirty many pages and is probably something to avoid.

Right. IMHO it would help us a lot.

> The main
> issue I see is that you don't really know how much work will happen in
> the first phase and how much would happen in the second.

With HOT, I see very little work left for the second pass. The dead
space is already collected in the first pass. The second pass only
cleans up the DEAD line pointers. Also if we can update FSM early (at
the end of first pass), we can avoid further irreverssible bloat of
the heap.

> no problem at all. I'd rather keep it as it is than have sometimes
> better, sometimes worse behaviour.
>

For large tables, two heap scans along with several additional page
writes doesn't seem to the cost we can afford, especially in IO-bound
application. IMHO a timed wait is not such a bad thing. Note that its
all about VACUUM which is a background, maintenance activity and it
won't harm to delay it by few seconds or even minutes. Also, as I said
earlier "waiting" is a minor detail, may be there is a better way to
do things.

Unless there are some strong objections, I would like to give it a
shot and see if there are any real benefits. We can then fix any
regression cases. Let me know if somebody thinks there are certain
show stoppers or the benefits of avoiding a second scan on a large
table is not worth. I personally have a strong feeling that it's worth
the efforts.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-05-30 08:56:45 Re: Core team statement on replication in PostgreSQL
Previous Message Florian G. Pflug 2008-05-30 08:42:34 Re: Hint Bits and Write I/O