Re: Single pass vacuum - take 1

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 1
Date: 2011-07-21 16:51:12
Message-ID: CABOikdM80mzK4e9gXZa+30dmEF1Xu6UpH6Jr=7sN9xhoze5D_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 21, 2011 at 12:17 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas
> > I think you can sidestep that
> > if you check that the page's vacuum LSN <= vacuum LSN in pg_class,
> instead
> > of equality.
>
> I don't think that works, because the point of storing the LSN in
> pg_class is to verify that the vacuum completed the index cleanup
> without error. The fact that a newer vacuum accomplished that goal
> does not mean that all older ones did.
>
>
The way we force the subsequent vacuum to also look at the pages scanned and
pruned by previous failed vacuum, all the pages that have dead-vacuum line
pointers would have a new stamp once the vacuum finishes successfully and
the pg_class would have the same stamp.

> > Ignoring the issue stated in previous paragraph, I think you wouldn't
> > actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
> > doesn't matter. In fact, a single bit would be enough. After a successful
> > vacuum, the counter on each heap page (with dead line pointers) is N, and
> > the value in pg_class is N. There are no other values on the heap,
> because
> > vacuum will have cleaned them up. When you begin the next vacuum, it will
> > stamp pages with N+1. So at any stage, there is only one of two values on
> > any page, so a single bit is enough. (But as I said, that doesn't hold if
> > vacuum skips some pages thanks to the visibility map)
>
> If this can be made to work, it's a very appealing idea.

I thought more about it and for a moment believed that we can do this with
just a bit since we rescan the pages with dead and dead-vacuum line
pointers after an aborted vacuum, but concluded that a bit or a small
counter is not good enough since other backends might be running with a
stale value and would get fooled into believing that they can collect the
dead-vacuum line pointers before the index pointers are actually removed. We
can still use a 32-bit counter though since the wrap-around for that is
practically very large for any backend to still run with such a stale
counter (you would need more than 1 billion vacuums on the same table in
between for you to hit this).

> The patch as
> submitted uses lp_off to store a single bit, to distinguish between
> vacuum and dead-vacuumed, but we could actually have (for greater
> safety and debuggability) a 15-byte counter that just wraps around
> from 32,767 to 1. (Maybe it would be wise to reserve a few counter
> values, or a few bits, or both, for future projects.) That would
> eliminate the need to touch PageRepairFragmentation() or use the
> special space, since all the information would be in the line pointer
> itself. Not having to rearrange the page to reclaim dead line
> pointers is appealing, too.
>
>
Not sure if I get you here. We need a mechanism to distinguish between dead
and dead-vacuum line pointers. How would the counter (which I assume you
mean 15-bit and not byte) help solve that ? Or are you just suggesting
replacing LSN with the counter in the page header ?

> > Is there something in place to make sure that pruning uses an up-to-date
> > relindxvacxlogid/off value? I guess it doesn't matter if it's
> out-of-date,
> > you'll just miss the opportunity to remove some dead tuples.
>
> This seems like a tricky problem, because it could cause us to
> repeatedly fail to remove the same dead line pointers, which would be
> poor. We could do something like this: after updating pg_class,
> vacuum send an interrupt to any backend which holds RowExclusiveLock
> or higher on that relation. The interrupt handler just sets a flag.
> If that backend does heap_page_prune() and sees the flag set, it knows
> that it needs to recheck pg_class. This is a bit grotty and doesn't
> completely close the race condition (the signal might not arrive in
> time), but it ought to make it narrow enough not to matter in
> practice.
>
>
I am not too excited about adding that complexity to the code. Even if a
backend does not have up-to-date value, it will fail to collect the
dead-vacuum pointers, but soon either it will catch up or some other backend
will remove them or the next vacuum will take care of it.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-07-21 16:56:53 Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Previous Message Pavan Deolasee 2011-07-21 16:46:02 Re: Single pass vacuum - take 1