Re: Proposal: Another attempt at vacuum improvements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-05-26 04:10:44
Message-ID: BANLkTinzsJC8kZ1js7KzSqimXU3bqNA7Mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 25, 2011 at 11:51 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> Agreed.  The only thing I'm trying to do further is to avoid the need
>> for a reshuffle when the special LSN storage is reclaimed.
>
> Ah ok. That was never clear from your initial emails or may be I
> mis-read.

Sorry, I must not have explained it very well. :-(

> So what you are saying is by storing LSN after line pointer
> array, we might be able to reclaim LSN storage without shuffling. That
> makes sense. Having said that, it doesn't excite me too much because I
> think we should do the dead line pointer reclaim operation during page
> pruning and we are already holding cleanup lock at that time and most
> likely do a reshuffle anyways.

I'll give that a firm maybe. If there is no reshuffle, then you can
do this with just an exclusive content lock. Maybe that's worthless,
but I'm not certain of it. I guess we might need to see how the code
shakes out.

Also, reshuffling might be more expensive. I agree that if there are
new dead tuples on the page, then you're going to be paying that price
anyway; but if not, it might be avoidable.

> Also a downside of storing LSN after line pointer array is that you
> may waste space because of alignment issues.

We could possibly store it unaligned and read it back two bytes at a
time. Granted, that's not free.

> I also thought that the
> LSN might come in between extending line pointer array, but probably
> thats not a big deal since if there is free space in the page (and
> there should be if we are adding a new tuple), it should be available
> immediately after the LSN.

Yeah. I'm not sure how icky that is, though.

> There are some other issues that we should think about too. Like
> recording free space  and managing visibility map. The free space is
> recorded in the second pass pass today, but I don't see any reason why
> that can't be moved to the first pass. Its not clear though if we
> should also record free space after retail page vacuum or leave it as
> it is.

Not sure. Any idea why it's like that, or why we might want to change it?

> For visibility maps, we should not update them until there are
> LP_DEAD line pointers on the page. Now thats not good because all
> tuples in the page may be visible, so we may loose some advantage, at
> least for a while, but if mark the page all-visible, the vacuum scan
> would not find the dead line pointers in it and that would leave
> dangling index pointers after an index vacuum.

Also, an index-only scan might return index tuples that are pointing
to dead line pointers.

Currently, I believe the only way a page can get marked all-visible is
by vacuum. But if we make this change, then it would be possible for
a HOT cleanup to encounter a situation where all-visible could be set.
We probably want to make that work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-26 04:18:01 Re: tackling full page writes
Previous Message Pavan Deolasee 2011-05-26 03:51:47 Re: Proposal: Another attempt at vacuum improvements