Re: HOT for PostgreSQL 8.3

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Pavan Deolasee <pavan(dot)deolasee(at)enterprisedb(dot)com>, Nikhil S <nikhil(dot)sontakke(at)enterprisedb(dot)com>
Subject: Re: HOT for PostgreSQL 8.3
Date: 2007-02-08 14:47:11
Message-ID: 45CB37EF.4060500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>> The basic idea is that when a tuple is UPDATEd we can, in certain
>> circumstances, avoid inserting index tuples for a tuple. Such tuples are
>> marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
>> other tuples.
>
> What is VACUUM FULL going to do when it wants to move one of these things?

I suppose it could move the whole chain to the same target page, if
there is one with enough space to accommodate the whole chain. Or we
could just cop out and not move tuples marked with HEAP_ONLY_TUPLE. I
think that would be acceptable; after the last transaction that can see
the old tuple is finished, the old tuple is dead. After that, VACUUM
FULL can remove the old tuple and move the remaining tuple as usual.

>> CREATE INDEX requires some careful work to allow it to identify and
>> correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
>> a result of the new index.
>
> I think you've glossed over the CREATE INDEX problem much too easily.
> The difficulty with that is going to be that de-HOT-ifying a tuple
> is going to require multiple updates that can't possibly be put into
> a single WAL record, and I don't think that WAL replay can clean up
> after an incomplete update (since it can't run user-defined functions
> and hence cannot be expected to compute index entries for itself).
> So I don't think you can do that while preserving crash safety.

Yeah, chilling tuples from HOT state to normal tuples is not easy.

One solution I thought of is to add another flag to heap tuples,
CHILL_IN_PROGRESS. To chill a tuple, you would:

1. Mark heap tuple with CHILL_IN_PROGRESS
2. Insert missing index entries
3. Clear CHILL_IN_PROGRESS and HEAP_ONLY_TUPLE flags

Index scans would ignore tuples with CHILL_IN_PROGRESS and directly
pointed to from the index. That way if we crash in the middle of step 2,
scans and updates would work normally after replay, as if the index
entries weren't there. CREATE INDEX would have to fail if there's any
CHILL_IN_PROGRESS tuples, because we wouldn't know which index entries
need to be inserted; some might already be there. To clear the
CHILL_IN_PROGRESS flag, a vacuum would be needed. Vacuum would remove
all index entries for those tuples, but instead of removing the heap
tuple in the 2nd scan it would just clear the CHILL_IN_PROGRESS flag,
bringing us back to where we started.

However, the easiest solution would be to make CREATE INDEX wait until
the old tuple is dead. That should be ok at least for concurrent CREATE
INDEX, because it already has that kind of a wait between 1st and 2nd
phase.

>> Removing the root tuple will require a VACUUM *FULL*.
>
> That seems unacceptable ... it won't take too long for your table to
> fill up with stubs, and we don't want to return to the bad old days
> when periodic VACUUM FULL was unavoidable.

Agreed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-02-08 15:09:51 Re: [pgsql-patches] pg_standby
Previous Message Magnus Hagander 2007-02-08 14:26:35 Re: How can I use 2GB of shared buffers on Windows?