Re: Single pass vacuum - take 1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Single pass vacuum - take 1
Date: 2011-07-21 20:01:46
Message-ID: CA+TgmobTpAD9H=-THqW9zMUp+1hJfcmmHWhjFXKv3Svui+_Lsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 21, 2011 at 12:51 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> The way we force the subsequent vacuum to also look at the pages scanned and
> pruned by previous failed vacuum, all the pages that have dead-vacuum line
> pointers would have a new stamp once the vacuum finishes successfully and
> the pg_class would have the same stamp.

That seems a bit fragile. One of the things we've talked about doing
is skipping pages that are pinned by some other backend. Now maybe
that would be infrequent enough not to matter... but...

Also, I'm not sure that's the only potential change that would break this.

I think we are better off doing only equality comparisons and dodging
this problem altogether.

> I thought more about it and for a moment believed that we can do this with
> just a bit since we rescan the  pages with dead and dead-vacuum line
> pointers after an aborted vacuum, but concluded that a bit or a small
> counter is not good enough since other backends might be running with a
> stale value and would get fooled into believing that they can collect the
> dead-vacuum line pointers before the index pointers are actually removed. We
> can still use a 32-bit counter though since the wrap-around for that is
> practically very large for any backend to still run with such a stale
> counter (you would need more than 1 billion vacuums on the same table in
> between for you to hit this).

I think that's a safe assumption.

>> The patch as
>> submitted uses lp_off to store a single bit, to distinguish between
>> vacuum and dead-vacuumed, but we could actually have (for greater
>> safety and debuggability) a 15-byte counter that just wraps around
>> from 32,767 to 1.  (Maybe it would be wise to reserve a few counter
>> values, or a few bits, or both, for future projects.)  That would
>> eliminate the need to touch PageRepairFragmentation() or use the
>> special space, since all the information would be in the line pointer
>> itself.  Not having to rearrange the page to reclaim dead line
>> pointers is appealing, too.
>
> Not sure if I get you here. We need a mechanism to distinguish between dead
> and dead-vacuum line pointers. How would the counter (which I assume you
> mean 15-bit and not byte) help solve that ? Or are you just suggesting
> replacing LSN with the counter in the page header ?

Just-plain-dead line pointers would have lp_off = 0. Dead-vacuumed
line pointers would have lp_off != 0. The first vacuum would use
lp_off = 1, the next one lp_off = 2, etc.

Actually, come to think of it, we could fit a 30-bit counter into the
line pointer. There are 15 unused bits in lp_off and 15 unused bits
in lp_len.

>> > Is there something in place to make sure that pruning uses an up-to-date
>> > relindxvacxlogid/off value? I guess it doesn't matter if it's
>> > out-of-date,
>> > you'll just miss the opportunity to remove some dead tuples.
>>
>> This seems like a tricky problem, because it could cause us to
>> repeatedly fail to remove the same dead line pointers, which would be
>> poor.  We could do something like this: after updating pg_class,
>> vacuum send an interrupt to any backend which holds RowExclusiveLock
>> or higher on that relation.  The interrupt handler just sets a flag.
>> If that backend does heap_page_prune() and sees the flag set, it knows
>> that it needs to recheck pg_class.  This is a bit grotty and doesn't
>> completely close the race condition (the signal might not arrive in
>> time), but it ought to make it narrow enough not to matter in
>> practice.
>
> I am not too excited about adding that complexity to the code. Even if a
> backend does not have up-to-date value, it will fail to collect the
> dead-vacuum pointers, but soon either it will catch up or some other backend
> will remove them or the next vacuum will take care of it.

If we use a counter that is large enough that we don't have to worry
about wrap-around, I guess that's OK, though it seems a little weird
to think about having different backends running with different ideas
about the correct counter value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-07-21 20:02:35 Re: sinval synchronization considered harmful
Previous Message Kohei KaiGai 2011-07-21 19:57:59 Re: [v9.1] sepgsql - userspace access vector cache