Re: [PATCHES] VACUUM Improvements - WIP Patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] VACUUM Improvements - WIP Patch
Date: 2008-07-14 15:23:45
Message-ID: 8741.1216049025@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> (taking the discussions to -hackers)
> On Sat, Jul 12, 2008 at 11:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (2) It achieves speedup of VACUUM by pushing work onto subsequent
>> regular accesses of the page, which is exactly the wrong thing.
>> Worse, once you count the disk writes those accesses will induce it's
>> not even clear that there's any genuine savings.

> Well in the worst case that is true. But in most other cases, the
> second pass work will be combined with other normal activities and the
> overhead will be shared, at least there is a chance for that. I think
> there is a chance for delaying the work until there is any real need
> for that e.g. INSERT or UPDATE on the page which would require a free
> line pointer.

That's just arm-waving: right now, pruning will be done by the next
*reader* of the page, whether or not he has any intention of *writing*
it. With no proposal on the table for improving that situation,
I don't see any credibility in arguing for over-complicating VACUUM
on the grounds that it might happen someday. In any case, the work
that is supposed to be done by VACUUM is being pushed to a foreground
query, which I find to be completely against our design principles.

>> It strikes me that what you are trying to do here is compensate for
>> a bad decision in the HOT patch, which was to have VACUUM's first
>> pass prune/defrag a page even when we know we are going to have to
>> come back to that page later. What about trying to fix things so
>> that if the page contains line pointers that need to be removed,
>> the first pass doesn't dirty it at all, but leaves all the work
>> to be done at the second visit?

> I am not against this idea. Just that it still requires us double scan
> of the main table and that's exactly what we are trying to avoid with
> this patch.

The part of the argument that I found convincing was trying to reduce
the write traffic (especially WAL log output), not avoiding a second
read. And the fundamental point still remains: the work should be done
in background, not foreground.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-07-14 15:42:21 Postgres-R source code release
Previous Message Knight, Doug 2008-07-14 15:21:23 Building under Visual Studio 2008 - pqcomm.c compile error

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-07-14 16:18:13 Re: variadic function support
Previous Message Pavel Stehule 2008-07-14 13:58:32 Re: variadic function support