Skip site navigation (1) Skip section navigation (2)

Re: pgstattuple extension for indexes

From: Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: pgstattuple extension for indexes
Date: 2006-09-03 23:13:33
Message-ID: 44FB619D.2010905@nttdata.co.jp (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Thanks Bruce,

Here are updated Japanese README, and uninstall_pgstattuple.sql.

Bruce Momjian wrote:
> Patch applied.  Thanks.
> 
> I updated the README documentation for the new functions, attached.  I
> could not update the Japanese version of the README.
> 
> ---------------------------------------------------------------------------
> 
> 
> Satoshi Nagayasu wrote:
>> Bruce,
>>
>> Attached patch has been cleaned up,
>> and modified to be able to work with CVS HEAD.
>>
>> Thanks.
>>
>> Satoshi Nagayasu wrote:
>>> Alvaro,
>>>
>>> Alvaro Herrera wrote:
>>>> Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
>>>>
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
>>>> /pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
>>>>
>>>>
>>>> While you're at it, please consider removing C++ style comments and
>>>> unused code.
>>>>
>>>> Formatting is way off as well, but I guess that is easily fixed with
>>>> pgindent.
>>> Thanks for comments. I'm going to fix my patch from now.
>>>
>>>> Regarding the pg_relpages function, why do you think it's necessary?
>>>> (It returns the true number of blocks of a given relation).  It may
>>>> belong into core given a reasonable use case, but otherwise it doesn't
>>>> seem to belong into pgstatindex (or pgstattuple for that matter).
>>> I wanted to sample some pages from the table/index, and get their statistics
>>> to know table/index conditions. I know pgstattuple() reports table
>>> statistics, however, pgstattuple() generates heavy CPU and I/O load.
>>>
>>> When we need to sample some pages from table/index, we need to know
>>> true number of blocks.
>>>
>>> I have another function, called pgstatpage(), to get information inside
>>> a single block/page statistics of the table. pg_relpages() will be used
>>> with this.
>>>
>>> Sorry for not mentioned in previous post about pgstatpage(),
>>> but I've remembered about it just now.
>>>
>>> Many memories in my brain have already `paged-out` (too busy in last few months),
>>> and some of them got `out-of-memory`. :^)
>>>
>>> Thanks.
>>
>> -- 
>> NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
>> Phone: +81-3-3523-8122
> 
>> diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
>> --- pgstattuple.orig/Makefile	2006-02-27 21:54:40.000000000 +0900
>> +++ pgstattuple/Makefile	2006-08-14 09:28:58.000000000 +0900
>> @@ -6,7 +6,7 @@
>>  #
>>  #-------------------------------------------------------------------------
>>  
>> -SRCS		= pgstattuple.c
>> +SRCS		= pgstattuple.c pgstatindex.c
>>  
>>  MODULE_big	= pgstattuple
>>  OBJS		= $(SRCS:.c=.o)
>> diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
>> --- pgstattuple.orig/pgstatindex.c	1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/pgstatindex.c	2006-08-14 11:24:23.000000000 +0900
>> @@ -0,0 +1,706 @@
>> +/*
>> + * pgstatindex
>> + *
>> + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus(at)nttdata(dot)co(dot)jp>
>> + *
>> + * Permission to use, copy, modify, and distribute this software and
>> + * its documentation for any purpose, without fee, and without a
>> + * written agreement is hereby granted, provided that the above
>> + * copyright notice and this paragraph and the following two
>> + * paragraphs appear in all copies.
>> + *
>> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
>> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
>> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
>> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
>> + * OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
>> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
>> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
>> + */
>> +
>> +#include "postgres.h"
>> +
>> +#include "fmgr.h"
>> +#include "funcapi.h"
>> +#include "access/heapam.h"
>> +#include "access/itup.h"
>> +#include "access/nbtree.h"
>> +#include "access/transam.h"
>> +#include "catalog/namespace.h"
>> +#include "catalog/pg_type.h"
>> +#include "utils/builtins.h"
>> +#include "utils/inval.h"
>> +
>> +PG_FUNCTION_INFO_V1(pgstatindex);
>> +PG_FUNCTION_INFO_V1(bt_metap);
>> +PG_FUNCTION_INFO_V1(bt_page_items);
>> +PG_FUNCTION_INFO_V1(bt_page_stats);
>> +PG_FUNCTION_INFO_V1(pg_relpages);
>> +
>> +extern Datum pgstatindex(PG_FUNCTION_ARGS);
>> +extern Datum bt_metap(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_items(PG_FUNCTION_ARGS);
>> +extern Datum bt_page_stats(PG_FUNCTION_ARGS);
>> +extern Datum pg_relpages(PG_FUNCTION_ARGS);
>> +
>> +#define PGSTATINDEX_TYPE "public.pgstatindex_type"
>> +#define PGSTATINDEX_NCOLUMNS 10
>> +
>> +#define BTMETAP_TYPE "public.bt_metap_type"
>> +#define BTMETAP_NCOLUMNS 6
>> +
>> +#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
>> +#define BTPAGEITEMS_NCOLUMNS 6
>> +
>> +#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
>> +#define BTPAGESTATS_NCOLUMNS 11
>> +
>> +
>> +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
>> +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
>> +
>> +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
>> +		if ( !(FirstOffsetNumber<=(offset) && \
>> +						(offset)<=PageGetMaxOffsetNumber(page)) ) \
>> +			 elog(ERROR, "Page offset number out of range."); }
>> +
>> +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
>> +		if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
>> +			 elog(ERROR, "Block number out of range."); }
>> +
>> +/* ------------------------------------------------
>> + * structure for single btree page statistics
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTPageStat
>> +{
>> +	uint32		blkno;
>> +	uint32		live_items;
>> +	uint32		dead_items;
>> +	uint32		page_size;
>> +	uint32		max_avail;
>> +	uint32		free_size;
>> +	uint32		avg_item_size;
>> +	uint32		fragments;
>> +	char		type;
>> +
>> +	/* opaque data */
>> +	BlockNumber btpo_prev;
>> +	BlockNumber btpo_next;
>> +	union
>> +	{
>> +		uint32		level;
>> +		TransactionId xact;
>> +	}			btpo;
>> +	uint16		btpo_flags;
>> +	BTCycleId	btpo_cycleid;
>> +}	BTPageStat;
>> +
>> +/* ------------------------------------------------
>> + * A structure for a whole btree index statistics
>> + * used by pgstatindex().
>> + * ------------------------------------------------
>> + */
>> +typedef struct BTIndexStat
>> +{
>> +	uint32		magic;
>> +	uint32		version;
>> +	BlockNumber root_blkno;
>> +	uint32		level;
>> +
>> +	BlockNumber fastroot;
>> +	uint32		fastlevel;
>> +
>> +	uint32		live_items;
>> +	uint32		dead_items;
>> +
>> +	uint32		root_pages;
>> +	uint32		internal_pages;
>> +	uint32		leaf_pages;
>> +	uint32		empty_pages;
>> +	uint32		deleted_pages;
>> +
>> +	uint32		page_size;
>> +	uint32		avg_item_size;
>> +
>> +	uint32		max_avail;
>> +	uint32		free_space;
>> +
>> +	uint32		fragments;
>> +}	BTIndexStat;
>> +
>> +/* -------------------------------------------------
>> + * GetBTPageStatistics()
>> + *
>> + * Collect statistics of single b-tree leaf page
>> + * -------------------------------------------------
>> + */
>> +static bool
>> +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
>> +{
>> +	Page		page = BufferGetPage(buffer);
>> +	PageHeader	phdr = (PageHeader) page;
>> +	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
>> +	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
>> +	int			item_size = 0;
>> +	int			off;
>> +
>> +	stat->blkno = blkno;
>> +
>> +	stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
>> +
>> +	stat->dead_items = stat->live_items = 0;
>> +
>> +	stat->page_size = PageGetPageSize(page);
>> +
>> +	/* page type (flags) */
>> +	if (P_ISDELETED(opaque))
>> +	{
>> +		stat->type = 'd';
>> +		return true;
>> +	}
>> +	else if (P_IGNORE(opaque))
>> +		stat->type = 'e';
>> +	else if (P_ISLEAF(opaque))
>> +		stat->type = 'l';
>> +	else if (P_ISROOT(opaque))
>> +		stat->type = 'r';
>> +	else
>> +		stat->type = 'i';
>> +
>> +	/* btpage opaque data */
>> +	stat->btpo_prev = opaque->btpo_prev;
>> +	stat->btpo_next = opaque->btpo_next;
>> +	if (P_ISDELETED(opaque))
>> +		stat->btpo.xact = opaque->btpo.xact;
>> +	else
>> +		stat->btpo.level = opaque->btpo.level;
>> +	stat->btpo_flags = opaque->btpo_flags;
>> +	stat->btpo_cycleid = opaque->btpo_cycleid;
>> +
>> +	/*----------------------------------------------
>> +	 * If a next leaf is on the previous block,
>> +	 * it means a fragmentation.
>> +	 *----------------------------------------------
>> +	 */
>> +	stat->fragments = 0;
>> +	if (stat->type == 'l')
>> +	{
>> +		if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
>> +			stat->fragments++;
>> +	}
>> +
>> +	/* count live and dead tuples, and free space */
>> +	for (off = FirstOffsetNumber; off <= maxoff; off++)
>> +	{
>> +		IndexTuple	itup;
>> +
>> +		ItemId		id = PageGetItemId(page, off);
>> +
>> +		itup = (IndexTuple) PageGetItem(page, id);
>> +
>> +		item_size += IndexTupleSize(itup);
>> +
>> +		if (!ItemIdDeleted(id))
>> +			stat->live_items++;
>> +		else
>> +			stat->dead_items++;
>> +	}
>> +	stat->free_size = PageGetFreeSpace(page);
>> +
>> +	if ((stat->live_items + stat->dead_items) > 0)
>> +		stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
>> +	else
>> +		stat->avg_item_size = 0;
>> +
>> +	return true;
>> +}
>> +
>> +
>> +/* ------------------------------------------------------
>> + * pgstatindex()
>> + *
>> + * Usage: SELECT * FROM pgstatindex('t1_pkey');
>> + * ------------------------------------------------------
>> + */
>> +Datum
>> +pgstatindex(PG_FUNCTION_ARGS)
>> +{
>> +	text	   *relname = PG_GETARG_TEXT_P(0);
>> +	Relation	rel;
>> +	RangeVar   *relrv;
>> +	Datum		result;
>> +	uint32		nblocks;
>> +	uint32		blkno;
>> +	BTIndexStat indexStat;
>> +
>> +	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +	rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +	if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +		elog(ERROR, "pgstatindex() can be used only on b-tree index.");
>> +
>> +	/*-------------------
>> +	 * Read a metapage
>> +	 *-------------------
>> +	 */
>> +	{
>> +		Buffer		buffer = ReadBuffer(rel, 0);
>> +		Page		page = BufferGetPage(buffer);
>> +		BTMetaPageData *metad = BTPageGetMeta(page);
>> +
>> +		indexStat.magic = metad->btm_magic;
>> +		indexStat.version = metad->btm_version;
>> +		indexStat.root_blkno = metad->btm_root;
>> +		indexStat.level = metad->btm_level;
>> +		indexStat.fastroot = metad->btm_fastroot;
>> +		indexStat.fastlevel = metad->btm_fastlevel;
>> +
>> +		ReleaseBuffer(buffer);
>> +	}
>> +
>> +	nblocks = RelationGetNumberOfBlocks(rel);
>> +
>> +	/* -- init stat -- */
>> +	indexStat.fragments = 0;
>> +
>> +	indexStat.root_pages = 0;
>> +	indexStat.leaf_pages = 0;
>> +	indexStat.internal_pages = 0;
>> +	indexStat.empty_pages = 0;
>> +	indexStat.deleted_pages = 0;
>> +
>> +	indexStat.max_avail = 0;
>> +	indexStat.free_space = 0;
>> +
>> +	/*-----------------------
>> +	 * Scan all blocks
>> +	 *-----------------------
>> +	 */
>> +	for (blkno = 1; blkno < nblocks; blkno++)
>> +	{
>> +		Buffer		buffer = ReadBuffer(rel, blkno);
>> +		BTPageStat	stat;
>> +
>> +		/* scan one page */
>> +		stat.blkno = blkno;
>> +		GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> +		/*---------------------
>> +		 * page status (type)
>> +		 *---------------------
>> +		 */
>> +		switch (stat.type)
>> +		{
>> +			case 'd':
>> +				indexStat.deleted_pages++;
>> +				break;
>> +			case 'l':
>> +				indexStat.leaf_pages++;
>> +				break;
>> +			case 'i':
>> +				indexStat.internal_pages++;
>> +				break;
>> +			case 'e':
>> +				indexStat.empty_pages++;
>> +				break;
>> +			case 'r':
>> +				indexStat.root_pages++;
>> +				break;
>> +			default:
>> +				elog(ERROR, "unknown page status.");
>> +		}
>> +
>> +		/* -- leaf fragmentation -- */
>> +		indexStat.fragments += stat.fragments;
>> +
>> +		if (stat.type == 'l')
>> +		{
>> +			indexStat.max_avail += stat.max_avail;
>> +			indexStat.free_space += stat.free_size;
>> +		}
>> +
>> +		ReleaseBuffer(buffer);
>> +	}
>> +
>> +	relation_close(rel, AccessShareLock);
>> +
>> +	/*----------------------------
>> +	 * Build a result tuple
>> +	 *----------------------------
>> +	 */
>> +	{
>> +		TupleDesc	tupleDesc;
>> +		int			j;
>> +		char	   *values[PGSTATINDEX_NCOLUMNS];
>> +
>> +		HeapTupleData tupleData;
>> +		HeapTuple	tuple = &tupleData;
>> +
>> +		tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
>> +
>> +		j = 0;
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.version);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.level);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", (indexStat.root_pages +
>> +										 indexStat.leaf_pages +
>> +										 indexStat.internal_pages +
>> +										 indexStat.deleted_pages +
>> +										 indexStat.empty_pages) * BLCKSZ);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.root_blkno);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.internal_pages);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.empty_pages);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
>> +
>> +		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +									   values);
>> +
>> +		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +	}
>> +
>> +	PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* -----------------------------------------------
>> + * bt_page()
>> + *
>> + * Usage: SELECT * FROM bt_page('t1_pkey', 0);
>> + * -----------------------------------------------
>> + */
>> +Datum
>> +bt_page_stats(PG_FUNCTION_ARGS)
>> +{
>> +	text	   *relname = PG_GETARG_TEXT_P(0);
>> +	uint32		blkno = PG_GETARG_UINT32(1);
>> +	Buffer		buffer;
>> +
>> +	Relation	rel;
>> +	RangeVar   *relrv;
>> +	Datum		result;
>> +
>> +	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +	rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +	CHECK_RELATION_BLOCK_RANGE(rel, blkno);
>> +
>> +	buffer = ReadBuffer(rel, blkno);
>> +
>> +	if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +		elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
>> +
>> +	if (blkno == 0)
>> +		elog(ERROR, "Block 0 is a meta page.");
>> +
>> +	{
>> +		HeapTuple	tuple;
>> +		TupleDesc	tupleDesc;
>> +		int			j;
>> +		char	   *values[BTPAGESTATS_NCOLUMNS];
>> +
>> +		BTPageStat	stat;
>> +
>> +		GetBTPageStatistics(blkno, buffer, &stat);
>> +
>> +		tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
>> +
>> +		j = 0;
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.blkno);
>> +
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%c", stat.type);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.live_items);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.dead_items);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.avg_item_size);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.page_size);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.free_size);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.btpo_prev);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.btpo_next);
>> +
>> +		values[j] = palloc(32);
>> +		if (stat.type == 'd')
>> +			snprintf(values[j++], 32, "%d", stat.btpo.xact);
>> +		else
>> +			snprintf(values[j++], 32, "%d", stat.btpo.level);
>> +
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", stat.btpo_flags);
>> +
>> +		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +									   values);
>> +
>> +		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +	}
>> +
>> +	ReleaseBuffer(buffer);
>> +
>> +	relation_close(rel, AccessShareLock);
>> +
>> +	PG_RETURN_DATUM(result);
>> +}
>> +
>> +/*-------------------------------------------------------
>> + * bt_page_items()
>> + *
>> + * Get IndexTupleData set in a leaf page
>> + *
>> + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
>> + *-------------------------------------------------------
>> + */
>> +/* ---------------------------------------------------
>> + * data structure for SRF to hold a scan information
>> + * ---------------------------------------------------
>> + */
>> +struct user_args
>> +{
>> +	TupleDesc	tupd;
>> +	Relation	rel;
>> +	Buffer		buffer;
>> +	Page		page;
>> +	uint16		offset;
>> +};
>> +
>> +Datum
>> +bt_page_items(PG_FUNCTION_ARGS)
>> +{
>> +	text	   *relname = PG_GETARG_TEXT_P(0);
>> +	uint32		blkno = PG_GETARG_UINT32(1);
>> +
>> +	RangeVar   *relrv;
>> +	Datum		result;
>> +	char	   *values[BTPAGEITEMS_NCOLUMNS];
>> +	BTPageOpaque opaque;
>> +	HeapTuple	tuple;
>> +	ItemId		id;
>> +
>> +	FuncCallContext *fctx;
>> +	MemoryContext mctx;
>> +	struct user_args *uargs = NULL;
>> +
>> +	if (blkno == 0)
>> +		elog(ERROR, "Block 0 is a meta page.");
>> +
>> +	if (SRF_IS_FIRSTCALL())
>> +	{
>> +		fctx = SRF_FIRSTCALL_INIT();
>> +		mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
>> +
>> +		uargs = palloc(sizeof(struct user_args));
>> +
>> +		uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
>> +		uargs->offset = FirstOffsetNumber;
>> +
>> +		relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +		uargs->rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +		CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
>> +
>> +		uargs->buffer = ReadBuffer(uargs->rel, blkno);
>> +
>> +		if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
>> +			elog(ERROR, "bt_page_items() can be used only on b-tree index.");
>> +
>> +		uargs->page = BufferGetPage(uargs->buffer);
>> +
>> +		opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
>> +
>> +		if (P_ISDELETED(opaque))
>> +			elog(NOTICE, "bt_page_items(): this page is deleted.");
>> +
>> +		fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
>> +		fctx->user_fctx = uargs;
>> +
>> +		MemoryContextSwitchTo(mctx);
>> +	}
>> +
>> +	fctx = SRF_PERCALL_SETUP();
>> +	uargs = fctx->user_fctx;
>> +
>> +	if (fctx->call_cntr < fctx->max_calls)
>> +	{
>> +		IndexTuple	itup;
>> +
>> +		id = PageGetItemId(uargs->page, uargs->offset);
>> +
>> +		if (!ItemIdIsValid(id))
>> +			elog(ERROR, "Invalid ItemId.");
>> +
>> +		itup = (IndexTuple) PageGetItem(uargs->page, id);
>> +
>> +		{
>> +			int			j = 0;
>> +
>> +			BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
>> +
>> +			values[j] = palloc(32);
>> +			snprintf(values[j++], 32, "%d", uargs->offset);
>> +			values[j] = palloc(32);
>> +			snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
>> +			values[j] = palloc(32);
>> +			snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
>> +			values[j] = palloc(32);
>> +			snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
>> +			values[j] = palloc(32);
>> +			snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
>> +
>> +			{
>> +				int			off;
>> +				char	   *dump;
>> +				char	   *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
>> +
>> +				dump = palloc(IndexTupleSize(itup) * 3);
>> +				memset(dump, 0, IndexTupleSize(itup) * 3);
>> +
>> +				for (off = 0;
>> +					 off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
>> +					 off++)
>> +				{
>> +					if (dump[0] == '\0')
>> +						sprintf(dump, "%02x", *(ptr + off) & 0xff);
>> +					else
>> +					{
>> +						char		buf[4];
>> +
>> +						sprintf(buf, " %02x", *(ptr + off) & 0xff);
>> +						strcat(dump, buf);
>> +					}
>> +				}
>> +				values[j] = dump;
>> +			}
>> +
>> +			tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
>> +			result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
>> +		}
>> +
>> +		uargs->offset = uargs->offset + 1;
>> +
>> +		SRF_RETURN_NEXT(fctx, result);
>> +	}
>> +	else
>> +	{
>> +		ReleaseBuffer(uargs->buffer);
>> +		relation_close(uargs->rel, AccessShareLock);
>> +
>> +		SRF_RETURN_DONE(fctx);
>> +	}
>> +}
>> +
>> +
>> +/* ------------------------------------------------
>> + * bt_metap()
>> + *
>> + * Get a btree meta-page information
>> + *
>> + * Usage: SELECT * FROM bt_metap('t1_pkey')
>> + * ------------------------------------------------
>> + */
>> +Datum
>> +bt_metap(PG_FUNCTION_ARGS)
>> +{
>> +	text	   *relname = PG_GETARG_TEXT_P(0);
>> +	Buffer		buffer;
>> +
>> +	Relation	rel;
>> +	RangeVar   *relrv;
>> +	Datum		result;
>> +
>> +	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +	rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +	if (!IS_INDEX(rel) || !IS_BTREE(rel))
>> +		elog(ERROR, "bt_metap() can be used only on b-tree index.");
>> +
>> +	buffer = ReadBuffer(rel, 0);
>> +
>> +	{
>> +		BTMetaPageData *metad;
>> +
>> +		TupleDesc	tupleDesc;
>> +		int			j;
>> +		char	   *values[BTMETAP_NCOLUMNS];
>> +		HeapTuple	tuple;
>> +
>> +		Page		page = BufferGetPage(buffer);
>> +
>> +		metad = BTPageGetMeta(page);
>> +
>> +		tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
>> +
>> +		j = 0;
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_magic);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_version);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_root);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_level);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_fastroot);
>> +		values[j] = palloc(32);
>> +		snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
>> +
>> +		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
>> +									   values);
>> +
>> +		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
>> +	}
>> +
>> +	ReleaseBuffer(buffer);
>> +
>> +	relation_close(rel, AccessShareLock);
>> +
>> +	PG_RETURN_DATUM(result);
>> +}
>> +
>> +/* --------------------------------------------------------
>> + * pg_relpages()
>> + *
>> + * Get a number of pages of the table/index.
>> + *
>> + * Usage: SELECT pg_relpages('t1');
>> + *		  SELECT pg_relpages('t1_pkey');
>> + * --------------------------------------------------------
>> + */
>> +Datum
>> +pg_relpages(PG_FUNCTION_ARGS)
>> +{
>> +	text	   *relname = PG_GETARG_TEXT_P(0);
>> +
>> +	Relation	rel;
>> +	RangeVar   *relrv;
>> +	int4		relpages;
>> +
>> +	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
>> +	rel = relation_openrv(relrv, AccessShareLock);
>> +
>> +	relpages = RelationGetNumberOfBlocks(rel);
>> +
>> +	relation_close(rel, AccessShareLock);
>> +
>> +	PG_RETURN_INT32(relpages);
>> +}
>> diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
>> --- pgstattuple.orig/pgstattuple.sql.in	2006-02-28 01:09:50.000000000 +0900
>> +++ pgstattuple/pgstattuple.sql.in	2006-08-14 10:37:32.000000000 +0900
>> @@ -22,3 +22,96 @@
>>  RETURNS pgstattuple_type
>>  AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>>  LANGUAGE C STRICT;
>> +
>> +--
>> +-- pgstatindex
>> +--
>> +DROP TYPE pgstatindex_type CASCADE;
>> +CREATE TYPE pgstatindex_type AS (
>> +  version int4,
>> +  tree_level int4,
>> +  index_size int4,
>> +  root_block_no int4,
>> +  internal_pages int4,
>> +  leaf_pages int4,
>> +  empty_pages int4,
>> +  deleted_pages int4,
>> +  avg_leaf_density float8,
>> +  leaf_fragmentation float8
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION pgstatindex(text)
>> +RETURNS pgstatindex_type
>> +AS 'MODULE_PATHNAME', 'pgstatindex'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_metap()
>> +--
>> +DROP TYPE bt_metap_type CASCADE;
>> +CREATE TYPE bt_metap_type AS (
>> +  magic int4,
>> +  version int4,
>> +  root int4,
>> +  level int4,
>> +  fastroot int4,
>> +  fastlevel int4
>> +);
>> +
>> +CREATE OR REPLACE FUNCTION bt_metap(text)
>> +RETURNS bt_metap_type
>> +AS 'MODULE_PATHNAME', 'bt_metap'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_items()
>> +--
>> +DROP TYPE bt_page_items_type CASCADE;
>> +CREATE TYPE bt_page_items_type AS (
>> +  itemoffset int4,
>> +  ctid tid,
>> +  itemlen int4,
>> +  nulls bool,
>> +  vars bool,
>> +  data text
>> +);
>> +
>> +DROP FUNCTION bt_page_items(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
>> +RETURNS SETOF bt_page_items_type
>> +AS 'MODULE_PATHNAME', 'bt_page_items'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- bt_page_stats()
>> +--
>> +DROP TYPE bt_page_stats_type CASCADE;
>> +CREATE TYPE bt_page_stats_type AS (
>> +  blkno int4,
>> +  type char,
>> +  live_items int4,
>> +  dead_items int4,
>> +  avg_item_size float,
>> +  page_size int4,
>> +  free_size int4,
>> +  btpo_prev int4,
>> +  btpo_next int4,
>> +  btpo int4,
>> +  btpo_flags int4
>> +);
>> +
>> +DROP FUNCTION bt_page_stats(text, int4);
>> +
>> +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
>> +RETURNS bt_page_stats_type
>> +AS 'MODULE_PATHNAME', 'bt_page_stats'
>> +LANGUAGE 'C' STRICT;
>> +
>> +--
>> +-- pg_relpages()
>> +--
>> +CREATE OR REPLACE FUNCTION pg_relpages(text)
>> +RETURNS int
>> +AS 'MODULE_PATHNAME', 'pg_relpages'
>> +LANGUAGE 'C' STRICT;
>> diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
>> --- pgstattuple.orig/test.sh	1970-01-01 09:00:00.000000000 +0900
>> +++ pgstattuple/test.sh	2006-08-14 10:23:08.000000000 +0900
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +
>> +PGHOME=/home/snaga/pgsql20060814
>> +export PATH=${PGHOME}/bin:$PATH
>> +
>> +psql -p 9999 pgbench<<EOF
>> +DROP FUNCTION pgstatindex(text);
>> +EOF
>> +
>> +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
>> +
>> +psql -p 9999 pgbench<<EOF
>> +SELECT * FROM pg_relpages('accounts_pkey');
>> +\x
>> +SELECT * FROM pgstatindex('accounts_pkey');
>> +SELECT * FROM bt_metap('accounts_pkey');
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 0);
>> +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
>> +
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +\x
>> +SELECT * FROM bt_page_stats('accounts_pkey', 1);
>> +SELECT * FROM bt_page_stats('accounts_pkey', 361);
>> +\x
>> +SELECT * FROM bt_page_items('accounts_pkey', 361);
>> +EOF
>>
>>
>> ------------------------------------------------------------------------
>>
>> pgstattuple README			2002/08/29 Tatsuo Ishii
>>
>> 1. Functions supported:
>>
>>     pgstattuple
>>     -----------
>>     pgstattuple() returns the relation length, percentage of the "dead"
>>     tuples of a relation and other info. This may help users to determine
>>     whether vacuum is necessary or not. Here is an example session:
>>
>>         test=> \x
>>         Expanded display is on.
>>         test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
>>         -[ RECORD 1 ]------+-------
>>         table_len          | 458752
>>         tuple_count        | 1470
>>         tuple_len          | 438896
>>         tuple_percent      | 95.67
>>         dead_tuple_count   | 11
>>         dead_tuple_len     | 3157
>>         dead_tuple_percent | 0.69
>>         free_space         | 8932
>>         free_percent       | 1.95
>>
>>     Here are explanations for each column:
>>
>>         table_len		-- physical relation length in bytes
>>         tuple_count		-- number of live tuples
>>         tuple_len		-- total tuples length in bytes
>>         tuple_percent	-- live tuples in %
>>         dead_tuple_len	-- total dead tuples length in bytes
>>         dead_tuple_percent	-- dead tuples in %
>>         free_space		-- free space in bytes
>>         free_percent	-- free space in %
>>
>>     pg_relpages
>>     -----------
>>     pg_relpages() returns the number of pages in the relation.
>>
>>     pgstatindex
>>     -----------
>>     pgstatindex() returns an array showing the information about an index:
>>
>>         test=> \x
>>         Expanded display is on.
>>         test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
>>         -[ RECORD 1 ]------+------
>>         version            | 2
>>         tree_level         | 0
>>         index_size         | 8192
>>         root_block_no      | 1
>>         internal_pages     | 0
>>         leaf_pages         | 1
>>         empty_pages        | 0
>>         deleted_pages      | 0
>>         avg_leaf_density   | 50.27
>>         leaf_fragmentation | 0
>>
>>     bt_metap
>>     --------
>>     bt_metap() returns information about the btree index metapage:
>>
>>         test=> SELECT * FROM bt_metap('pg_cast_oid_index');
>>         -[ RECORD 1 ]-----
>>         magic     | 340322
>>         version   | 2
>>         root      | 1
>>         level     | 0
>>         fastroot  | 1
>>         fastlevel | 0
>>
>>     bt_page_stats
>>     -------------
>>     bt_page_stats() shows information about single btree pages:
>>
>>         test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
>>         -[ RECORD 1 ]-+-----
>>         blkno         | 1
>>         type          | l
>>         live_items    | 256
>>         dead_items    | 0
>>         avg_item_size | 12
>>         page_size     | 8192
>>         free_size     | 4056
>>         btpo_prev     | 0
>>         btpo_next     | 0
>>         btpo          | 0
>>         btpo_flags    | 3
>>
>>     bt_page_items
>>     -------------
>>     bt_page_items() returns information about specific items on btree pages:
>>
>>         test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
>>          itemoffset |  ctid   | itemlen | nulls | vars |    data
>>         ------------+---------+---------+-------+------+-------------
>>                   1 | (0,1)   |      12 | f     | f    | 23 27 00 00
>>                   2 | (0,2)   |      12 | f     | f    | 24 27 00 00
>>                   3 | (0,3)   |      12 | f     | f    | 25 27 00 00
>>                   4 | (0,4)   |      12 | f     | f    | 26 27 00 00
>>                   5 | (0,5)   |      12 | f     | f    | 27 27 00 00
>>                   6 | (0,6)   |      12 | f     | f    | 28 27 00 00
>>                   7 | (0,7)   |      12 | f     | f    | 29 27 00 00
>>                   8 | (0,8)   |      12 | f     | f    | 2a 27 00 00
>>
>>
>> 2. Installing pgstattuple
>>
>>     $ make
>>     $ make install
>>     $ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
>>
>>
>> 3. Using pgstattuple
>>
>>     pgstattuple may be called as a relation function and is
>>     defined as follows:
>>
>>     CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
>>      AS 'MODULE_PATHNAME', 'pgstattuple'
>>      LANGUAGE C STRICT;
>>
>>     CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
>>      AS 'MODULE_PATHNAME', 'pgstattuplebyid'
>>      LANGUAGE C STRICT;
>>
>>     The argument is the relation name (optionally it may be qualified)
>>     or the OID of the relation.  Note that pgstattuple only returns
>>     one row.
>>
>>
>> 4. Notes
>>
>>     pgstattuple acquires only a read lock on the relation. So concurrent
>>     update may affect the result.
>>
>>     pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
>>     returns false.
>>
>>
>> 5. History
>>
>>     2006/06/28
>>
>> 	Extended to work against indexes.


-- 
NAGAYASU Satoshi <nagayasus(at)nttdata(dot)co(dot)jp>
Phone: +81-3-3523-8122

Attachment: uninstall_pgstattuple.sql
Description: text/plain (479 bytes)
Attachment: README.pgstattuple.euc_jp
Description: text/plain (5.5 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2006-09-03 23:18:17
Subject: Re: @ versus ~, redux
Previous:From: Tom LaneDate: 2006-09-03 23:00:29
Subject: Re: pgstattuple extension for indexes

pgsql-patches by date

Next:From: Satoshi NagayasuDate: 2006-09-03 23:25:31
Subject: Re: pgstattuple extension for indexes
Previous:From: Tom LaneDate: 2006-09-03 23:00:29
Subject: Re: pgstattuple extension for indexes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group