Re: Maintaining cluster order on insert

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <systemguards(at)gmail(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Maintaining cluster order on insert
Date: 2007-06-18 05:52:55
Message-ID: 46761DB7.7070505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> The implementation has changed a bit since August. I thought I had
>> submitted an updated version in the winter but couldn't find it. Anyway,
>> I updated and dusted off the source tree, tidied up the comments a
>> little bit, and fixed some inconsistencies in pg_proc entries that made
>> opr_sanity to fail.
>
> I started looking at this patch. My first reaction is that I liked last
> August's API (an independent "suggestblock" call) a lot better. I think
> trying to hold an index page lock across the heap insert is an extremely
> bad idea; it will hurt concurrency and possibly cause deadlocks
> (especially for unique indexes).

The index page is not kept locked across the calls. Just pinned.

The reason for switching to the new API instead of the amsuggestblock
API is CPU overhead. It avoids constructing the IndexTuple twice and
descending the tree twice.

Clustering is mainly useful when the table doesn't fit in cache, so one
could argue that if you care about clustering you're most likely I/O
bound and don't care about the CPU overhead that much. Nevertheless,
avoiding it seems like a good idea to me.

The amsuggestblock API is simpler, though. We might choose it on those
grounds alone.

> The other question is why is execMain involved in this? That makes the
> design nonfunctional for tuples inserted in any other way than through
> the main executor --- COPY for instance. Also, if this is successful,
> I could see using it on system catalogs eventually. I'm inclined to
> think that the right design is for heap_insert to call amsuggestblock
> for itself, or maybe even push that down to RelationGetBufferForTuple.
> (Note: having heap_insert contain logic that duplicates
> RelationGetBufferForTuple's is another bit of bad design here, but
> that's at least correctable locally.) Now the difficulty in that is
> that the heapam.c routines don't have hold of any data structure
> containing index knowledge ... but they do have hold of the Relation
> structure for the heap. I suggest making RelationGetIndexList() cache
> the OID of the clustered index (if any) in relcache entries, and add
> RelationGetClusterIndex much like RelationGetOidIndex, and then
> heap_insert can use that.

Hmm. My first reaction on that is that having heap_insert reach into an
index is a modularity violation. It wouldn't be too hard to make similar
changes to COPY that I did to the executor.

I doubt it's very useful for system catalogs; they should never grow
very large, and should stay mostly cached anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-06-18 05:58:11 Re: Bugtraq: Having Fun With PostgreSQL
Previous Message ITAGAKI Takahiro 2007-06-18 05:50:57 Re: Load Distributed Checkpoints, revised patch

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2007-06-18 08:49:28 Re: Maintaining cluster order on insert
Previous Message ITAGAKI Takahiro 2007-06-18 05:50:57 Re: Load Distributed Checkpoints, revised patch