Re: [PATCHES] VACUUM Improvements - WIP Patch

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] VACUUM Improvements - WIP Patch
Date: 2008-07-14 03:50:02
Message-ID: 2e78013d0807132050j16cdc558s4dc3f889371a937d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

(taking the discussions to -hackers)

On Sat, Jul 12, 2008 at 11:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>
> (2) It achieves speedup of VACUUM by pushing work onto subsequent
> regular accesses of the page, which is exactly the wrong thing.
> Worse, once you count the disk writes those accesses will induce it's
> not even clear that there's any genuine savings.
>

Well in the worst case that is true. But in most other cases, the
second pass work will be combined with other normal activities and the
overhead will be shared, at least there is a chance for that. I think
there is a chance for delaying the work until there is any real need
for that e.g. INSERT or UPDATE on the page which would require a free
line pointer.

> (3) The fact that it doesn't work until concurrent transactions have
> gone away makes it of extremely dubious value in real-world scenarios,
> as already noted by Simon.
>

If there are indeed long running concurrent transactions, we won't get
any benefit of this optimization. But then there are several more
common cases of very short concurrent transactions. In those cases and
for very large tables, reducing the vacuum time is a significant win.
The FSM will be written early and significant work of the VACUUM can
be finished quickly.

> It strikes me that what you are trying to do here is compensate for
> a bad decision in the HOT patch, which was to have VACUUM's first
> pass prune/defrag a page even when we know we are going to have to
> come back to that page later. What about trying to fix things so
> that if the page contains line pointers that need to be removed,
> the first pass doesn't dirty it at all, but leaves all the work
> to be done at the second visit? I think that since heap_page_prune
> has been refactored into a "scan" followed by an "apply", it'd be
> possible to decide before the "apply" step whether this is the case
> or not.
>

I am not against this idea. Just that it still requires us double scan
of the main table and that's exactly what we are trying to avoid with
this patch.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-07-14 03:52:43 Re: gsoc, text search selectivity and dllist enhancments
Previous Message Tom Lane 2008-07-14 01:17:35 Re: PATCH: CITEXT 2.0 v3

Browse pgsql-patches by date

  From Date Subject
Next Message Pavel Stehule 2008-07-14 07:26:41 Re: variadic function support
Previous Message Teodor Sigaev 2008-07-13 21:05:16 Re: [PATCHES] GIN improvements