Re: Proposal: Another attempt at vacuum improvements

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-05-25 11:07:55
Message-ID: BANLkTimKSzkUPck6ghm-Er3YTU8jE86JCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2011 at 10:59 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> So, first of all, thanks for putting some effort and thought into
> this. Despite the large number of improvements in this area in 8.3
> and 8.4, this is still a pain point, and it would be really nice to
> find a way to make some further improvements.
>
>
Thanks for bringing up the idea during PGCon. That helped me to get
interested in this again. I hope we would be able to take this to a logical
conclusion this time and do something to alleviate the pain.

> On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> > So the idea is to separate the index vacuum (removing index pointers to
> dead
> > tuples) from the heap vacuum. When we do heap vacuum (either by
> HOT-pruning
> > or using regular vacuum), we can spool the dead line pointers somewhere.
> To
> > avoid any hot-spots during normal processing, the spooling can be done
> > periodically like the stats collection.
>
> What happens if the system crashes after a line pointer becomes dead
> but before the record of its death is safely on disk? The fact that a
> previous index vacuum has committed is only sufficient justification
> for reclaiming the dead line pointers if you're positive that the
> index vacuum killed the index pointers for *every* dead line pointer.
> I'm not sure we want to go there; any operation that wants to make a
> line pointer dead will need to be XLOG'd. Instead, I think we should
> stick with your original idea and just try to avoid the second heap
> pass.
>
>
I would not mind keeping the design simple for the first release. So even if
we can avoid the second heap scan in vacuum, that would be a big win. In the
long term though, I think it will pay off keeping track of dead line
pointers as they are generated. The only way though they are generated is
while cleaning up the page holding the clean-up lock and the operation is
WAL logged. So spooling dead line pointers during WAL replay should be
possible.

Anyways, I would like not to pursue the idea and I am OK with a simplified
version to start with where every heap vacuum is followed by index vacuum,
collecting and holding the dead line pointers in the maintenance memory.

> So to do that, as you say, we can have every operation that creates a
> dead line pointer note the LSN of the operation in the page.

Yes.

> But instead of allocating permanent space in the page header, which would
> both reduce (admittedly only by 8 bytes) the amount of space available
> for tuples, and more significantly have the effect of breaking on-disk
> compatibility, I'm wondering if we could get by with making space for
> that extra LSN only when it's actually present. In other words, when
> it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
> increment pd_upper, and use the extra space to store the LSN. There
> is an alignment problem to worry about there but that shouldn't be a
> huge issue.
>
>
That might work but would require us to move tuples around when the first
dead line pointer gets generated in the page. You may argue that we should
be holding a cleanup-lock when that happens and the dead line pointer
creation is always followed by a call to PageRepairFragmentation(), so it
should be easier to make space for the LSN.

Instead of storing the LSN after the page header, would it be easier to set
pd_special and store the LSN at the end of the page ?

> When we vacuum, we remember the LSN before we start. When we finish,
> if we scanned the indexes and everything completed without error, then
> we bump the heap's notion (wherever we store it) of the last
> successful index vacuum. When we vacuum or do HOT cleanup on a page,
> if the page has a most-recent-dead-line pointer LSN and it precedes
> the start-of-last-successful-index-vacuum LSN, then we mark all the
> LP_DEAD tuples as LP_UNUSED and throw away the
> most-recent-dead-line-pointer LSN.
>
>
Right. And if the cleanup generates new dead line pointers, the LSN will be
replaced with the LSN of the current operation.

> One downside of this approach is that, if we do something like this,
> it'll become slightly more complicated to figure out where the item
> pointer array ends. Another issue is that we might find ourselves
> wanting to extend the item pointer array to add a new item, and unable
> to do so easily because this most-recent-dead-line-pointer LSN is in
> the way.

I think that should be not so difficult to handle. I think handling this by
special space mechanism might be less complicated.

> If the LSN stored in the page precedes the
> start-of-last-successful-index-vacuum LSN, and if, further, we can get
> a buffer cleanup lock on the page, then we can do a HOT cleanup and
> life is good. Otherwise, we can either (1) just forget about the
> most-recent-dead-line-pointer LSN - not ideal but not catastrophic
> either - or (2) if the start-of-last-successful-vacuum-LSN is old
> enough, we could overwrite an LP_DEAD line pointer in place.
>
>
I don't think we need the cleanup lock to turn the LP_DEAD line pointers to
LP_UNUSED since that does not involve moving tuples around. So a simple
EXCLUSIVE lock should be enough. But we would need to WAL log the operation
of turning DEAD to UNUSED, so it would be simpler to consolidate this in HOT
pruning. There could be exceptions such as, say large number of DEAD line
pointers are accumulated towards the end and reclaiming those would free up
substantial space in the page. But may be we can use those conditions to
invoke HOT prune instead of handling them separately.

> Another issue is that this causes problems for temporary and unlogged
> tables, because no WAL records are generated and, therefore, the LSN
> does not advance. This is also a problem for GIST indexes; Heikki
> fixed temporary GIST indexes by generating fake LSNs off of a
> backend-local counter. Unlogged GIST indexes are currently not
> supported. I think what we need to do is create an API to which you
> can pass a relation and get an LSN. If it's a permanent relation, you
> get a regular LSN. If it's a temporary relation, you get a fake LSN
> based on a backend-local counter. If it's an unlogged relation, you
> get a fake LSN based on a shared-memory counter that is reset on
> restart. If we can encapsulate that properly, it should provide both
> what we need to make this idea work and allow a somewhat graceful fix
> for GIST-vs-unlogged problem.
>
>
Can you explain more how things would work for unlogged tables ? Do we use
the same shared memory counter for tracking last successful index vacuum ?
If so, how do we handle the case where after restart the page may get LSN
less than the index vacuum LSN if the index vacuum happened before the
crash/stop ? We might be fooled into believing that the index pointers are
all removed even for dead line pointers generated after the restart ? We can
possibly handle that by resetting the index vacuum LSN so that nothing gets
removed until one cycle of heap and index vacuum is done. But I am not sure
how easy would it be to reset the index vacuum LSNs for all unlogged
relations at the end of recovery.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-05-25 11:12:47 Volunteering as Commitfest Manager
Previous Message Stephen Frost 2011-05-25 10:42:25 Re: Pre-alloc ListCell's optimization