Re: Proposal: Another attempt at vacuum improvements

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-05-24 17:29:59
Message-ID: BANLkTiknzuwNaRwoHxFVgp1M2T=4Rbm6hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So, first of all, thanks for putting some effort and thought into
this. Despite the large number of improvements in this area in 8.3
and 8.4, this is still a pain point, and it would be really nice to
find a way to make some further improvements.

On Tue, May 24, 2011 at 2:58 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> So the idea is to separate the index vacuum (removing index pointers to dead
> tuples) from the heap vacuum. When we do heap vacuum (either by HOT-pruning
> or using regular vacuum), we can spool the dead line pointers somewhere. To
> avoid any hot-spots during normal processing, the spooling can be done
> periodically like the stats collection.

What happens if the system crashes after a line pointer becomes dead
but before the record of its death is safely on disk? The fact that a
previous index vacuum has committed is only sufficient justification
for reclaiming the dead line pointers if you're positive that the
index vacuum killed the index pointers for *every* dead line pointer.
I'm not sure we want to go there; any operation that wants to make a
line pointer dead will need to be XLOG'd. Instead, I think we should
stick with your original idea and just try to avoid the second heap
pass.

So to do that, as you say, we can have every operation that creates a
dead line pointer note the LSN of the operation in the page. But
instead of allocating permanent space in the page header, which would
both reduce (admittedly only by 8 bytes) the amount of space available
for tuples, and more significantly have the effect of breaking on-disk
compatibility, I'm wondering if we could get by with making space for
that extra LSN only when it's actually present. In other words, when
it's present, we set a bit PD_HAS_DEAD_LINE_PTR_LSN or somesuch,
increment pd_upper, and use the extra space to store the LSN. There
is an alignment problem to worry about there but that shouldn't be a
huge issue.

When we vacuum, we remember the LSN before we start. When we finish,
if we scanned the indexes and everything completed without error, then
we bump the heap's notion (wherever we store it) of the last
successful index vacuum. When we vacuum or do HOT cleanup on a page,
if the page has a most-recent-dead-line pointer LSN and it precedes
the start-of-last-successful-index-vacuum LSN, then we mark all the
LP_DEAD tuples as LP_UNUSED and throw away the
most-recent-dead-line-pointer LSN.

One downside of this approach is that, if we do something like this,
it'll become slightly more complicated to figure out where the item
pointer array ends. Another issue is that we might find ourselves
wanting to extend the item pointer array to add a new item, and unable
to do so easily because this most-recent-dead-line-pointer LSN is in
the way. If the LSN stored in the page precedes the
start-of-last-successful-index-vacuum LSN, and if, further, we can get
a buffer cleanup lock on the page, then we can do a HOT cleanup and
life is good. Otherwise, we can either (1) just forget about the
most-recent-dead-line-pointer LSN - not ideal but not catastrophic
either - or (2) if the start-of-last-successful-vacuum-LSN is old
enough, we could overwrite an LP_DEAD line pointer in place.

Another issue is that this causes problems for temporary and unlogged
tables, because no WAL records are generated and, therefore, the LSN
does not advance. This is also a problem for GIST indexes; Heikki
fixed temporary GIST indexes by generating fake LSNs off of a
backend-local counter. Unlogged GIST indexes are currently not
supported. I think what we need to do is create an API to which you
can pass a relation and get an LSN. If it's a permanent relation, you
get a regular LSN. If it's a temporary relation, you get a fake LSN
based on a backend-local counter. If it's an unlogged relation, you
get a fake LSN based on a shared-memory counter that is reset on
restart. If we can encapsulate that properly, it should provide both
what we need to make this idea work and allow a somewhat graceful fix
for GIST-vs-unlogged problem.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei Kaigai 2011-05-24 17:30:32 Re: Error compiling sepgsql in PG9.1
Previous Message Tom Lane 2011-05-24 17:28:38 Re: Domains versus polymorphic functions, redux