Re: Maintaining cluster order on insert

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: Maintaining cluster order on insert
Date: 2008-04-11 19:38:30
Message-ID: 200804111938.m3BJcUj02170@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


This idea has been rejected to do poor performance results reported
later in the thread.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> While thinking about index-organized-tables and similar ideas, it
> occurred to me that there's some low-hanging-fruit: maintaining cluster
> order on inserts by trying to place new heap tuples close to other
> similar tuples. That involves asking the index am where on the heap the
> new tuple should go, and trying to insert it there before using the FSM.
> Using the new fillfactor parameter makes it more likely that there's
> room on the page. We don't worry about the order within the page.
>
> The API I'm thinking of introduces a new optional index am function,
> amsuggestblock (suggestions for a better name are welcome). It gets the
> same parameters as aminsert, and returns the heap block number that
> would be optimal place to put the new tuple. It's be called from
> ExecInsert before inserting the heap tuple, and the suggestion is passed
> on to heap_insert and RelationGetBufferForTuple.
>
> I wrote a little patch to implement this for btree, attached.
>
> This could be optimized by changing the existing aminsert API, because
> as it is, an insert will have to descend the btree twice. Once in
> amsuggestblock and then in aminsert. amsuggestblock could keep the right
> index page pinned so aminsert could locate it quicker. But I wanted to
> keep this simple for now. Another improvement might be to allow
> amsuggestblock to return a list of suggestions, but that makes it more
> expensive to insert if there isn't room in the suggested pages, since
> heap_insert will have to try them all before giving up.
>
> Comments regarding the general idea or the patch? There should probably
> be a index option to turn the feature on and off. You'll want to turn it
> off when you first load a table, and turn it on after CLUSTER to keep it
> clustered.
>
> Since there's been discussion on keeping the TODO list more up-to-date,
> I hereby officially claim the "Automatically maintain clustering on a
> table" TODO item :). Feel free to bombard me with requests for status
> reports. And just to be clear, I'm not trying to sneak this into 8.2
> anymore, this is 8.3 stuff.
>
> I won't be implementing a background daemon described on the TODO item,
> since that would essentially be an online version of CLUSTER. Which sure
> would be nice, but that's a different story.
>
> - Heikki
>

[ text/x-patch is unsupported, treating like TEXT/PLAIN ]

> Index: doc/src/sgml/catalogs.sgml
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/catalogs.sgml,v
> retrieving revision 2.129
> diff -c -r2.129 catalogs.sgml
> *** doc/src/sgml/catalogs.sgml 31 Jul 2006 20:08:55 -0000 2.129
> --- doc/src/sgml/catalogs.sgml 8 Aug 2006 16:17:21 -0000
> ***************
> *** 499,504 ****
> --- 499,511 ----
> <entry>Function to parse and validate reloptions for an index</entry>
> </row>
>
> + <row>
> + <entry><structfield>amsuggestblock</structfield></entry>
> + <entry><type>regproc</type></entry>
> + <entry><literal><link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.oid</literal></entry>
> + <entry>Get the best place in the heap to put a new tuple</entry>
> + </row>
> +
> </tbody>
> </tgroup>
> </table>
> Index: doc/src/sgml/indexam.sgml
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/doc/src/sgml/indexam.sgml,v
> retrieving revision 2.16
> diff -c -r2.16 indexam.sgml
> *** doc/src/sgml/indexam.sgml 31 Jul 2006 20:08:59 -0000 2.16
> --- doc/src/sgml/indexam.sgml 8 Aug 2006 17:15:25 -0000
> ***************
> *** 391,396 ****
> --- 391,414 ----
> <function>amoptions</> to test validity of options settings.
> </para>
>
> + <para>
> + <programlisting>
> + BlockNumber
> + amsuggestblock (Relation indexRelation,
> + Datum *values,
> + bool *isnull,
> + Relation heapRelation);
> + </programlisting>
> + Gets the optimal place in the heap for a new tuple. The parameters
> + correspond the parameters for <literal>aminsert</literal>.
> + This function is called on the clustered index before a new tuple
> + is inserted to the heap, and it should choose the optimal insertion
> + target page on the heap in such manner that the heap stays as close
> + as possible to the index order.
> + <literal>amsuggestblock</literal> can return InvalidBlockNumber if
> + the index am doesn't have a suggestion.
> + </para>
> +
> </sect1>
>
> <sect1 id="index-scanning">
> Index: src/backend/access/heap/heapam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/heapam.c,v
> retrieving revision 1.218
> diff -c -r1.218 heapam.c
> *** src/backend/access/heap/heapam.c 31 Jul 2006 20:08:59 -0000 1.218
> --- src/backend/access/heap/heapam.c 8 Aug 2006 16:17:21 -0000
> ***************
> *** 1325,1330 ****
> --- 1325,1335 ----
> * use_fsm is passed directly to RelationGetBufferForTuple, which see for
> * more info.
> *
> + * suggested_blk can be set by the caller to hint heap_insert which
> + * block would be the best place to put the new tuple in. heap_insert can
> + * ignore the suggestion, if there's not enough room on that block.
> + * InvalidBlockNumber means no preference.
> + *
> * The return value is the OID assigned to the tuple (either here or by the
> * caller), or InvalidOid if no OID. The header fields of *tup are updated
> * to match the stored tuple; in particular tup->t_self receives the actual
> ***************
> *** 1333,1339 ****
> */
> Oid
> heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> ! bool use_wal, bool use_fsm)
> {
> TransactionId xid = GetCurrentTransactionId();
> HeapTuple heaptup;
> --- 1338,1344 ----
> */
> Oid
> heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> ! bool use_wal, bool use_fsm, BlockNumber suggested_blk)
> {
> TransactionId xid = GetCurrentTransactionId();
> HeapTuple heaptup;
> ***************
> *** 1386,1392 ****
>
> /* Find buffer to insert this tuple into */
> buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! InvalidBuffer, use_fsm);
>
> /* NO EREPORT(ERROR) from here till changes are logged */
> START_CRIT_SECTION();
> --- 1391,1397 ----
>
> /* Find buffer to insert this tuple into */
> buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! InvalidBuffer, use_fsm, suggested_blk);
>
> /* NO EREPORT(ERROR) from here till changes are logged */
> START_CRIT_SECTION();
> ***************
> *** 1494,1500 ****
> Oid
> simple_heap_insert(Relation relation, HeapTuple tup)
> {
> ! return heap_insert(relation, tup, GetCurrentCommandId(), true, true);
> }
>
> /*
> --- 1499,1506 ----
> Oid
> simple_heap_insert(Relation relation, HeapTuple tup)
> {
> ! return heap_insert(relation, tup, GetCurrentCommandId(), true,
> ! true, InvalidBlockNumber);
> }
>
> /*
> ***************
> *** 2079,2085 ****
> {
> /* Assume there's no chance to put heaptup on same page. */
> newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! buffer, true);
> }
> else
> {
> --- 2085,2092 ----
> {
> /* Assume there's no chance to put heaptup on same page. */
> newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! buffer, true,
> ! InvalidBlockNumber);
> }
> else
> {
> ***************
> *** 2096,2102 ****
> */
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! buffer, true);
> }
> else
> {
> --- 2103,2110 ----
> */
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> newbuf = RelationGetBufferForTuple(relation, heaptup->t_len,
> ! buffer, true,
> ! InvalidBlockNumber);
> }
> else
> {
> Index: src/backend/access/heap/hio.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/hio.c,v
> retrieving revision 1.63
> diff -c -r1.63 hio.c
> *** src/backend/access/heap/hio.c 3 Jul 2006 22:45:37 -0000 1.63
> --- src/backend/access/heap/hio.c 9 Aug 2006 18:03:01 -0000
> ***************
> *** 93,98 ****
> --- 93,100 ----
> * any committed data of other transactions. (See heap_insert's comments
> * for additional constraints needed for safe usage of this behavior.)
> *
> + * If the caller has a suggestion, it's passed in suggestedBlock.
> + *
> * We always try to avoid filling existing pages further than the fillfactor.
> * This is OK since this routine is not consulted when updating a tuple and
> * keeping it on the same page, which is the scenario fillfactor is meant
> ***************
> *** 103,109 ****
> */
> Buffer
> RelationGetBufferForTuple(Relation relation, Size len,
> ! Buffer otherBuffer, bool use_fsm)
> {
> Buffer buffer = InvalidBuffer;
> Page pageHeader;
> --- 105,112 ----
> */
> Buffer
> RelationGetBufferForTuple(Relation relation, Size len,
> ! Buffer otherBuffer, bool use_fsm,
> ! BlockNumber suggestedBlock)
> {
> Buffer buffer = InvalidBuffer;
> Page pageHeader;
> ***************
> *** 135,142 ****
> otherBlock = InvalidBlockNumber; /* just to keep compiler quiet */
>
> /*
> ! * We first try to put the tuple on the same page we last inserted a tuple
> ! * on, as cached in the relcache entry. If that doesn't work, we ask the
> * shared Free Space Map to locate a suitable page. Since the FSM's info
> * might be out of date, we have to be prepared to loop around and retry
> * multiple times. (To insure this isn't an infinite loop, we must update
> --- 138,147 ----
> otherBlock = InvalidBlockNumber; /* just to keep compiler quiet */
>
> /*
> ! * We first try to put the tuple on the page suggested by the caller, if
> ! * any. Then we try to put the tuple on the same page we last inserted a
> ! * tuple on, as cached in the relcache entry. If that doesn't work, we
> ! * ask the
> * shared Free Space Map to locate a suitable page. Since the FSM's info
> * might be out of date, we have to be prepared to loop around and retry
> * multiple times. (To insure this isn't an infinite loop, we must update
> ***************
> *** 144,152 ****
> * not to be suitable.) If the FSM has no record of a page with enough
> * free space, we give up and extend the relation.
> *
> ! * When use_fsm is false, we either put the tuple onto the existing target
> ! * page or extend the relation.
> */
> if (len + saveFreeSpace <= MaxTupleSize)
> targetBlock = relation->rd_targblock;
> else
> --- 149,167 ----
> * not to be suitable.) If the FSM has no record of a page with enough
> * free space, we give up and extend the relation.
> *
> ! * When use_fsm is false, we skip the fsm lookup if neither the suggested
> ! * nor the cached last insertion page has enough room, and extend the
> ! * relation.
> ! *
> ! * The fillfactor is taken into account when calculating the free space
> ! * on the cached target block, and when using the FSM. The suggested page
> ! * is used whenever there's enough room in it, regardless of the fillfactor,
> ! * because that's exactly the purpose the space is reserved for in the
> ! * first place.
> */
> + if (suggestedBlock != InvalidBlockNumber)
> + targetBlock = suggestedBlock;
> + else
> if (len + saveFreeSpace <= MaxTupleSize)
> targetBlock = relation->rd_targblock;
> else
> ***************
> *** 219,224 ****
> --- 234,244 ----
> */
> pageHeader = (Page) BufferGetPage(buffer);
> pageFreeSpace = PageGetFreeSpace(pageHeader);
> +
> + /* If we're trying the suggested block, don't care about fillfactor */
> + if (targetBlock == suggestedBlock && len <= pageFreeSpace)
> + return buffer;
> +
> if (len + saveFreeSpace <= pageFreeSpace)
> {
> /* use this page as future insert target, too */
> ***************
> *** 241,246 ****
> --- 261,275 ----
> ReleaseBuffer(buffer);
> }
>
> + /* If we just tried the suggested block, try the cached target
> + * block next, before consulting the FSM. */
> + if(suggestedBlock == targetBlock)
> + {
> + targetBlock = relation->rd_targblock;
> + suggestedBlock = InvalidBlockNumber;
> + continue;
> + }
> +
> /* Without FSM, always fall out of the loop and extend */
> if (!use_fsm)
> break;
> Index: src/backend/access/index/genam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/genam.c,v
> retrieving revision 1.58
> diff -c -r1.58 genam.c
> *** src/backend/access/index/genam.c 31 Jul 2006 20:08:59 -0000 1.58
> --- src/backend/access/index/genam.c 8 Aug 2006 16:17:21 -0000
> ***************
> *** 259,261 ****
> --- 259,275 ----
>
> pfree(sysscan);
> }
> +
> + /*
> + * This is a dummy implementation of amsuggestblock, to be used for index
> + * access methods that don't or can't support it. It just returns
> + * InvalidBlockNumber, which means "no preference".
> + *
> + * This is probably not a good best place for this function, but it doesn't
> + * fit naturally anywhere else either.
> + */
> + Datum
> + dummysuggestblock(PG_FUNCTION_ARGS)
> + {
> + PG_RETURN_UINT32(InvalidBlockNumber);
> + }
> Index: src/backend/access/index/indexam.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/index/indexam.c,v
> retrieving revision 1.94
> diff -c -r1.94 indexam.c
> *** src/backend/access/index/indexam.c 31 Jul 2006 20:08:59 -0000 1.94
> --- src/backend/access/index/indexam.c 8 Aug 2006 16:17:21 -0000
> ***************
> *** 18,23 ****
> --- 18,24 ----
> * index_rescan - restart a scan of an index
> * index_endscan - end a scan
> * index_insert - insert an index tuple into a relation
> + * index_suggestblock - get desired insert location for a heap tuple
> * index_markpos - mark a scan position
> * index_restrpos - restore a scan position
> * index_getnext - get the next tuple from a scan
> ***************
> *** 202,207 ****
> --- 203,237 ----
> BoolGetDatum(check_uniqueness)));
> }
>
> + /* ----------------
> + * index_suggestblock - get desired insert location for a heap tuple
> + *
> + * The returned BlockNumber is the *heap* page that is the best place
> + * to insert the given tuple to, according to the index am. The best
> + * place is usually one that maintains the cluster order.
> + * ----------------
> + */
> + BlockNumber
> + index_suggestblock(Relation indexRelation,
> + Datum *values,
> + bool *isnull,
> + Relation heapRelation)
> + {
> + FmgrInfo *procedure;
> +
> + RELATION_CHECKS;
> + GET_REL_PROCEDURE(amsuggestblock);
> +
> + /*
> + * have the am's suggestblock proc do all the work.
> + */
> + return DatumGetUInt32(FunctionCall4(procedure,
> + PointerGetDatum(indexRelation),
> + PointerGetDatum(values),
> + PointerGetDatum(isnull),
> + PointerGetDatum(heapRelation)));
> + }
> +
> /*
> * index_beginscan - start a scan of an index with amgettuple
> *
> Index: src/backend/access/nbtree/nbtinsert.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v
> retrieving revision 1.142
> diff -c -r1.142 nbtinsert.c
> *** src/backend/access/nbtree/nbtinsert.c 25 Jul 2006 19:13:00 -0000 1.142
> --- src/backend/access/nbtree/nbtinsert.c 9 Aug 2006 17:51:33 -0000
> ***************
> *** 146,151 ****
> --- 146,221 ----
> }
>
> /*
> + * _bt_suggestblock() -- Find the heap block of the closest index tuple.
> + *
> + * The logic to find the target should match _bt_doinsert, otherwise
> + * we'll be making bad suggestions.
> + */
> + BlockNumber
> + _bt_suggestblock(Relation rel, IndexTuple itup, Relation heapRel)
> + {
> + int natts = rel->rd_rel->relnatts;
> + OffsetNumber offset;
> + Page page;
> + BTPageOpaque opaque;
> +
> + ScanKey itup_scankey;
> + BTStack stack;
> + Buffer buf;
> + IndexTuple curitup;
> + BlockNumber suggestion = InvalidBlockNumber;
> +
> + /* we need an insertion scan key to do our search, so build one */
> + itup_scankey = _bt_mkscankey(rel, itup);
> +
> + /* find the first page containing this key */
> + stack = _bt_search(rel, natts, itup_scankey, false, &buf, BT_READ);
> + if(!BufferIsValid(buf))
> + {
> + /* The index was completely empty. No suggestion then. */
> + return InvalidBlockNumber;
> + }
> + /* we don't need the stack, so free it right away */
> + _bt_freestack(stack);
> +
> + page = BufferGetPage(buf);
> + opaque = (BTPageOpaque) PageGetSpecialPointer(page);
> +
> + /* Find the location in the page where the new index tuple would go to. */
> +
> + offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
> + if (offset > PageGetMaxOffsetNumber(page))
> + {
> + /* _bt_binsrch returned pointer to end-of-page. It means that
> + * there was no equal items on the page, and the new item should
> + * be inserted as the last tuple of the page. There could be equal
> + * items on the next page, however.
> + *
> + * At the moment, we just ignore the potential equal items on the
> + * right, and pretend there isn't any. We could instead walk right
> + * to the next page to check that, but let's keep it simple for now.
> + */
> + offset = OffsetNumberPrev(offset);
> + }
> + if(offset < P_FIRSTDATAKEY(opaque))
> + {
> + /* We landed on an empty page. We could step left or right until
> + * we find some items, but let's keep it simple for now.
> + */
> + } else {
> + /* We're now positioned at the index tuple that we're interested in. */
> +
> + curitup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
> + suggestion = ItemPointerGetBlockNumber(&curitup->t_tid);
> + }
> +
> + _bt_relbuf(rel, buf);
> + _bt_freeskey(itup_scankey);
> +
> + return suggestion;
> + }
> +
> + /*
> * _bt_check_unique() -- Check for violation of unique index constraint
> *
> * Returns InvalidTransactionId if there is no conflict, else an xact ID
> Index: src/backend/access/nbtree/nbtree.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtree.c,v
> retrieving revision 1.149
> diff -c -r1.149 nbtree.c
> *** src/backend/access/nbtree/nbtree.c 10 May 2006 23:18:39 -0000 1.149
> --- src/backend/access/nbtree/nbtree.c 9 Aug 2006 18:04:02 -0000
> ***************
> *** 228,233 ****
> --- 228,265 ----
> }
>
> /*
> + * btsuggestblock() -- find the best place in the heap to put a new tuple.
> + *
> + * This uses the same logic as btinsert to find the place where the index
> + * tuple would go if this was a btinsert call.
> + *
> + * There's room for improvement here. An insert operation will descend
> + * the tree twice, first by btsuggestblock, then by btinsert. Things
> + * might have changed in between, so that the heap tuple is actually
> + * not inserted in the optimal page, but since this is just an
> + * optimization, it's ok if it happens sometimes.
> + */
> + Datum
> + btsuggestblock(PG_FUNCTION_ARGS)
> + {
> + Relation rel = (Relation) PG_GETARG_POINTER(0);
> + Datum *values = (Datum *) PG_GETARG_POINTER(1);
> + bool *isnull = (bool *) PG_GETARG_POINTER(2);
> + Relation heapRel = (Relation) PG_GETARG_POINTER(3);
> + IndexTuple itup;
> + BlockNumber suggestion;
> +
> + /* generate an index tuple */
> + itup = index_form_tuple(RelationGetDescr(rel), values, isnull);
> +
> + suggestion =_bt_suggestblock(rel, itup, heapRel);
> +
> + pfree(itup);
> +
> + PG_RETURN_UINT32(suggestion);
> + }
> +
> + /*
> * btgettuple() -- Get the next tuple in the scan.
> */
> Datum
> Index: src/backend/executor/execMain.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execMain.c,v
> retrieving revision 1.277
> diff -c -r1.277 execMain.c
> *** src/backend/executor/execMain.c 31 Jul 2006 01:16:37 -0000 1.277
> --- src/backend/executor/execMain.c 8 Aug 2006 16:17:21 -0000
> ***************
> *** 892,897 ****
> --- 892,898 ----
> resultRelInfo->ri_RangeTableIndex = resultRelationIndex;
> resultRelInfo->ri_RelationDesc = resultRelationDesc;
> resultRelInfo->ri_NumIndices = 0;
> + resultRelInfo->ri_ClusterIndex = -1;
> resultRelInfo->ri_IndexRelationDescs = NULL;
> resultRelInfo->ri_IndexRelationInfo = NULL;
> /* make a copy so as not to depend on relcache info not changing... */
> ***************
> *** 1388,1394 ****
> heap_insert(estate->es_into_relation_descriptor, tuple,
> estate->es_snapshot->curcid,
> estate->es_into_relation_use_wal,
> ! false); /* never any point in using FSM */
> /* we know there are no indexes to update */
> heap_freetuple(tuple);
> IncrAppended();
> --- 1389,1396 ----
> heap_insert(estate->es_into_relation_descriptor, tuple,
> estate->es_snapshot->curcid,
> estate->es_into_relation_use_wal,
> ! false, /* never any point in using FSM */
> ! InvalidBlockNumber);
> /* we know there are no indexes to update */
> heap_freetuple(tuple);
> IncrAppended();
> ***************
> *** 1419,1424 ****
> --- 1421,1427 ----
> ResultRelInfo *resultRelInfo;
> Relation resultRelationDesc;
> Oid newId;
> + BlockNumber suggestedBlock;
>
> /*
> * get the heap tuple out of the tuple table slot, making sure we have a
> ***************
> *** 1467,1472 ****
> --- 1470,1479 ----
> if (resultRelationDesc->rd_att->constr)
> ExecConstraints(resultRelInfo, slot, estate);
>
> + /* Ask the index am of the clustered index for the
> + * best place to put it */
> + suggestedBlock = ExecSuggestBlock(slot, estate);
> +
> /*
> * insert the tuple
> *
> ***************
> *** 1475,1481 ****
> */
> newId = heap_insert(resultRelationDesc, tuple,
> estate->es_snapshot->curcid,
> ! true, true);
>
> IncrAppended();
> (estate->es_processed)++;
> --- 1482,1488 ----
> */
> newId = heap_insert(resultRelationDesc, tuple,
> estate->es_snapshot->curcid,
> ! true, true, suggestedBlock);
>
> IncrAppended();
> (estate->es_processed)++;
> Index: src/backend/executor/execUtils.c
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/executor/execUtils.c,v
> retrieving revision 1.139
> diff -c -r1.139 execUtils.c
> *** src/backend/executor/execUtils.c 4 Aug 2006 21:33:36 -0000 1.139
> --- src/backend/executor/execUtils.c 9 Aug 2006 18:05:05 -0000
> ***************
> *** 31,36 ****
> --- 31,37 ----
> * ExecOpenIndices \
> * ExecCloseIndices | referenced by InitPlan, EndPlan,
> * ExecInsertIndexTuples / ExecInsert, ExecUpdate
> + * ExecSuggestBlock Referenced by ExecInsert
> *
> * RegisterExprContextCallback Register function shutdown callback
> * UnregisterExprContextCallback Deregister function shutdown callback
> ***************
> *** 874,879 ****
> --- 875,881 ----
> IndexInfo **indexInfoArray;
>
> resultRelInfo->ri_NumIndices = 0;
> + resultRelInfo->ri_ClusterIndex = -1;
>
> /* fast path if no indexes */
> if (!RelationGetForm(resultRelation)->relhasindex)
> ***************
> *** 913,918 ****
> --- 915,925 ----
> /* extract index key information from the index's pg_index info */
> ii = BuildIndexInfo(indexDesc);
>
> + /* Remember which index is the clustered one.
> + * It's used to call the suggestblock-method on inserts */
> + if(indexDesc->rd_index->indisclustered)
> + resultRelInfo->ri_ClusterIndex = i;
> +
> relationDescs[i] = indexDesc;
> indexInfoArray[i] = ii;
> i++;
> ***************
> *** 1062,1067 ****
> --- 1069,1137 ----
> }
> }
>
> + /* ----------------------------------------------------------------
> + * ExecSuggestBlock
> + *
> + * This routine asks the index am where a new heap tuple
> + * should be placed.
> + * ----------------------------------------------------------------
> + */
> + BlockNumber
> + ExecSuggestBlock(TupleTableSlot *slot,
> + EState *estate)
> + {
> + ResultRelInfo *resultRelInfo;
> + int i;
> + Relation relationDesc;
> + Relation heapRelation;
> + ExprContext *econtext;
> + Datum values[INDEX_MAX_KEYS];
> + bool isnull[INDEX_MAX_KEYS];
> + IndexInfo *indexInfo;
> +
> + /*
> + * Get information from the result relation info structure.
> + */
> + resultRelInfo = estate->es_result_relation_info;
> + i = resultRelInfo->ri_ClusterIndex;
> + if(i == -1)
> + return InvalidBlockNumber; /* there was no clustered index */
> +
> + heapRelation = resultRelInfo->ri_RelationDesc;
> + relationDesc = resultRelInfo->ri_IndexRelationDescs[i];
> + indexInfo = resultRelInfo->ri_IndexRelationInfo[i];
> +
> + /* You can't cluster on a partial index */
> + Assert(indexInfo->ii_Predicate == NIL);
> +
> + /*
> + * We will use the EState's per-tuple context for evaluating
> + * index expressions (creating it if it's not already there).
> + */
> + econtext = GetPerTupleExprContext(estate);
> +
> + /* Arrange for econtext's scan tuple to be the tuple under test */
> + econtext->ecxt_scantuple = slot;
> +
> + /*
> + * FormIndexDatum fills in its values and isnull parameters with the
> + * appropriate values for the column(s) of the index.
> + */
> + FormIndexDatum(indexInfo,
> + slot,
> + estate,
> + values,
> + isnull);
> +
> + /*
> + * The index AM does the rest.
> + */
> + return index_suggestblock(relationDesc, /* index relation */
> + values, /* array of index Datums */
> + isnull, /* null flags */
> + heapRelation);
> + }
> +
> /*
> * UpdateChangedParamSet
> * Add changed parameters to a plan node's chgParam set
> Index: src/include/access/genam.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/genam.h,v
> retrieving revision 1.65
> diff -c -r1.65 genam.h
> *** src/include/access/genam.h 31 Jul 2006 20:09:05 -0000 1.65
> --- src/include/access/genam.h 9 Aug 2006 17:53:44 -0000
> ***************
> *** 93,98 ****
> --- 93,101 ----
> ItemPointer heap_t_ctid,
> Relation heapRelation,
> bool check_uniqueness);
> + extern BlockNumber index_suggestblock(Relation indexRelation,
> + Datum *values, bool *isnull,
> + Relation heapRelation);
>
> extern IndexScanDesc index_beginscan(Relation heapRelation,
> Relation indexRelation,
> ***************
> *** 123,128 ****
> --- 126,133 ----
> extern FmgrInfo *index_getprocinfo(Relation irel, AttrNumber attnum,
> uint16 procnum);
>
> + extern Datum dummysuggestblock(PG_FUNCTION_ARGS);
> +
> /*
> * index access method support routines (in genam.c)
> */
> Index: src/include/access/heapam.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/heapam.h,v
> retrieving revision 1.114
> diff -c -r1.114 heapam.h
> *** src/include/access/heapam.h 3 Jul 2006 22:45:39 -0000 1.114
> --- src/include/access/heapam.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 156,162 ****
> extern void setLastTid(const ItemPointer tid);
>
> extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> ! bool use_wal, bool use_fsm);
> extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
> ItemPointer ctid, TransactionId *update_xmax,
> CommandId cid, Snapshot crosscheck, bool wait);
> --- 156,162 ----
> extern void setLastTid(const ItemPointer tid);
>
> extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> ! bool use_wal, bool use_fsm, BlockNumber suggestedblk);
> extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,
> ItemPointer ctid, TransactionId *update_xmax,
> CommandId cid, Snapshot crosscheck, bool wait);
> Index: src/include/access/hio.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/hio.h,v
> retrieving revision 1.32
> diff -c -r1.32 hio.h
> *** src/include/access/hio.h 13 Jul 2006 17:47:01 -0000 1.32
> --- src/include/access/hio.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 21,26 ****
> extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
> HeapTuple tuple);
> extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
> ! Buffer otherBuffer, bool use_fsm);
>
> #endif /* HIO_H */
> --- 21,26 ----
> extern void RelationPutHeapTuple(Relation relation, Buffer buffer,
> HeapTuple tuple);
> extern Buffer RelationGetBufferForTuple(Relation relation, Size len,
> ! Buffer otherBuffer, bool use_fsm, BlockNumber suggestedblk);
>
> #endif /* HIO_H */
> Index: src/include/access/nbtree.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/access/nbtree.h,v
> retrieving revision 1.103
> diff -c -r1.103 nbtree.h
> *** src/include/access/nbtree.h 7 Aug 2006 16:57:57 -0000 1.103
> --- src/include/access/nbtree.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 467,472 ****
> --- 467,473 ----
> extern Datum btbulkdelete(PG_FUNCTION_ARGS);
> extern Datum btvacuumcleanup(PG_FUNCTION_ARGS);
> extern Datum btoptions(PG_FUNCTION_ARGS);
> + extern Datum btsuggestblock(PG_FUNCTION_ARGS);
>
> /*
> * prototypes for functions in nbtinsert.c
> ***************
> *** 476,481 ****
> --- 477,484 ----
> extern Buffer _bt_getstackbuf(Relation rel, BTStack stack, int access);
> extern void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
> BTStack stack, bool is_root, bool is_only);
> + extern BlockNumber _bt_suggestblock(Relation rel, IndexTuple itup,
> + Relation heapRel);
>
> /*
> * prototypes for functions in nbtpage.c
> Index: src/include/catalog/pg_am.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_am.h,v
> retrieving revision 1.46
> diff -c -r1.46 pg_am.h
> *** src/include/catalog/pg_am.h 31 Jul 2006 20:09:05 -0000 1.46
> --- src/include/catalog/pg_am.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 65,70 ****
> --- 65,71 ----
> regproc amvacuumcleanup; /* post-VACUUM cleanup function */
> regproc amcostestimate; /* estimate cost of an indexscan */
> regproc amoptions; /* parse AM-specific parameters */
> + regproc amsuggestblock; /* suggest a block where to put heap tuple */
> } FormData_pg_am;
>
> /* ----------------
> ***************
> *** 78,84 ****
> * compiler constants for pg_am
> * ----------------
> */
> ! #define Natts_pg_am 23
> #define Anum_pg_am_amname 1
> #define Anum_pg_am_amstrategies 2
> #define Anum_pg_am_amsupport 3
> --- 79,85 ----
> * compiler constants for pg_am
> * ----------------
> */
> ! #define Natts_pg_am 24
> #define Anum_pg_am_amname 1
> #define Anum_pg_am_amstrategies 2
> #define Anum_pg_am_amsupport 3
> ***************
> *** 102,123 ****
> #define Anum_pg_am_amvacuumcleanup 21
> #define Anum_pg_am_amcostestimate 22
> #define Anum_pg_am_amoptions 23
>
> /* ----------------
> * initial contents of pg_am
> * ----------------
> */
>
> ! DATA(insert OID = 403 ( btree 5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions ));
> DESCR("b-tree index access method");
> #define BTREE_AM_OID 403
> ! DATA(insert OID = 405 ( hash 1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions ));
> DESCR("hash index access method");
> #define HASH_AM_OID 405
> ! DATA(insert OID = 783 ( gist 100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions ));
> DESCR("GiST index access method");
> #define GIST_AM_OID 783
> ! DATA(insert OID = 2742 ( gin 100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions ));
> DESCR("GIN index access method");
> #define GIN_AM_OID 2742
>
> --- 103,125 ----
> #define Anum_pg_am_amvacuumcleanup 21
> #define Anum_pg_am_amcostestimate 22
> #define Anum_pg_am_amoptions 23
> + #define Anum_pg_am_amsuggestblock 24
>
> /* ----------------
> * initial contents of pg_am
> * ----------------
> */
>
> ! DATA(insert OID = 403 ( btree 5 1 1 t t t t f t btinsert btbeginscan btgettuple btgetmulti btrescan btendscan btmarkpos btrestrpos btbuild btbulkdelete btvacuumcleanup btcostestimate btoptions btsuggestblock));
> DESCR("b-tree index access method");
> #define BTREE_AM_OID 403
> ! DATA(insert OID = 405 ( hash 1 1 0 f f f f f f hashinsert hashbeginscan hashgettuple hashgetmulti hashrescan hashendscan hashmarkpos hashrestrpos hashbuild hashbulkdelete hashvacuumcleanup hashcostestimate hashoptions dummysuggestblock));
> DESCR("hash index access method");
> #define HASH_AM_OID 405
> ! DATA(insert OID = 783 ( gist 100 7 0 f t t t t t gistinsert gistbeginscan gistgettuple gistgetmulti gistrescan gistendscan gistmarkpos gistrestrpos gistbuild gistbulkdelete gistvacuumcleanup gistcostestimate gistoptions dummysuggestblock));
> DESCR("GiST index access method");
> #define GIST_AM_OID 783
> ! DATA(insert OID = 2742 ( gin 100 4 0 f f f f t f gininsert ginbeginscan gingettuple gingetmulti ginrescan ginendscan ginmarkpos ginrestrpos ginbuild ginbulkdelete ginvacuumcleanup gincostestimate ginoptions dummysuggestblock ));
> DESCR("GIN index access method");
> #define GIN_AM_OID 2742
>
> Index: src/include/catalog/pg_proc.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/catalog/pg_proc.h,v
> retrieving revision 1.420
> diff -c -r1.420 pg_proc.h
> *** src/include/catalog/pg_proc.h 6 Aug 2006 03:53:44 -0000 1.420
> --- src/include/catalog/pg_proc.h 9 Aug 2006 18:06:44 -0000
> ***************
> *** 682,687 ****
> --- 682,689 ----
> DESCR("btree(internal)");
> DATA(insert OID = 2785 ( btoptions PGNSP PGUID 12 f f t f s 2 17 "1009 16" _null_ _null_ _null_ btoptions - _null_ ));
> DESCR("btree(internal)");
> + DATA(insert OID = 2852 ( btsuggestblock PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_ _null_ btsuggestblock - _null_ ));
> + DESCR("btree(internal)");
>
> DATA(insert OID = 339 ( poly_same PGNSP PGUID 12 f f t f i 2 16 "604 604" _null_ _null_ _null_ poly_same - _null_ ));
> DESCR("same as?");
> ***************
> *** 3936,3941 ****
> --- 3938,3946 ----
> DATA(insert OID = 2749 ( arraycontained PGNSP PGUID 12 f f t f i 2 16 "2277 2277" _null_ _null_ _null_ arraycontained - _null_ ));
> DESCR("anyarray contained");
>
> + DATA(insert OID = 2853 ( dummysuggestblock PGNSP PGUID 12 f f t f v 4 23 "2281 2281 2281 2281" _null_ _null_ _null_ dummysuggestblock - _null_ ));
> + DESCR("dummy amsuggestblock implementation (internal)");
> +
> /*
> * Symbolic values for provolatile column: these indicate whether the result
> * of a function is dependent *only* on the values of its explicit arguments,
> Index: src/include/executor/executor.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/executor/executor.h,v
> retrieving revision 1.128
> diff -c -r1.128 executor.h
> *** src/include/executor/executor.h 4 Aug 2006 21:33:36 -0000 1.128
> --- src/include/executor/executor.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 271,276 ****
> --- 271,277 ----
> extern void ExecCloseIndices(ResultRelInfo *resultRelInfo);
> extern void ExecInsertIndexTuples(TupleTableSlot *slot, ItemPointer tupleid,
> EState *estate, bool is_vacuum);
> + extern BlockNumber ExecSuggestBlock(TupleTableSlot *slot, EState *estate);
>
> extern void RegisterExprContextCallback(ExprContext *econtext,
> ExprContextCallbackFunction function,
> Index: src/include/nodes/execnodes.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/nodes/execnodes.h,v
> retrieving revision 1.158
> diff -c -r1.158 execnodes.h
> *** src/include/nodes/execnodes.h 4 Aug 2006 21:33:36 -0000 1.158
> --- src/include/nodes/execnodes.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 257,262 ****
> --- 257,264 ----
> * NumIndices # of indices existing on result relation
> * IndexRelationDescs array of relation descriptors for indices
> * IndexRelationInfo array of key/attr info for indices
> + * ClusterIndex index to the IndexRelationInfo array of the
> + * clustered index, or -1 if there's none
> * TrigDesc triggers to be fired, if any
> * TrigFunctions cached lookup info for trigger functions
> * TrigInstrument optional runtime measurements for triggers
> ***************
> *** 272,277 ****
> --- 274,280 ----
> int ri_NumIndices;
> RelationPtr ri_IndexRelationDescs;
> IndexInfo **ri_IndexRelationInfo;
> + int ri_ClusterIndex;
> TriggerDesc *ri_TrigDesc;
> FmgrInfo *ri_TrigFunctions;
> struct Instrumentation *ri_TrigInstrument;
> Index: src/include/utils/rel.h
> ===================================================================
> RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/include/utils/rel.h,v
> retrieving revision 1.91
> diff -c -r1.91 rel.h
> *** src/include/utils/rel.h 3 Jul 2006 22:45:41 -0000 1.91
> --- src/include/utils/rel.h 8 Aug 2006 16:17:21 -0000
> ***************
> *** 116,121 ****
> --- 116,122 ----
> FmgrInfo amvacuumcleanup;
> FmgrInfo amcostestimate;
> FmgrInfo amoptions;
> + FmgrInfo amsuggestblock;
> } RelationAmInfo;
>
>

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-04-11 19:38:53 Re: Proposal: real procedures again (8.4)
Previous Message Teodor Sigaev 2008-04-11 19:38:01 Re: Remove lossy-operator RECHECK flag?

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Chernow 2008-04-11 21:52:23 Re: libpq Win32 Mutex performance patch
Previous Message Teodor Sigaev 2008-04-11 19:38:01 Re: Remove lossy-operator RECHECK flag?