Re: a fast bloat measurement tool (was Re: Measuring relation free space)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date: 2015-05-09 00:20:51
Message-ID: 20150509002051.GD12950@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-04-24 08:46:48 +0530, Abhijit Menon-Sen wrote:
> diff --git a/contrib/pgstattuple/pgstatbloat.c b/contrib/pgstattuple/pgstatbloat.c
> new file mode 100644
> index 0000000..612e22b
> --- /dev/null
> +++ b/contrib/pgstattuple/pgstatbloat.c
> @@ -0,0 +1,389 @@
> +/*
> + * contrib/pgstattuple/pgstatbloat.c
> + *
> + * Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
> + * Portions Copyright (c) 2001,2002 Tatsuo Ishii (from pgstattuple)

I think for new extension we don't really add authors and such anymore.

> + * Permission to use, copy, modify, and distribute this software and
> + * its documentation for any purpose, without fee, and without a
> + * written agreement is hereby granted, provided that the above
> + * copyright notice and this paragraph and the following two
> + * paragraphs appear in all copies.
> + *
> + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
> + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
> + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
> + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
> + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
> + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
> + */

Shouldn't be here in a contrib module.

> +PG_FUNCTION_INFO_V1(pgstatbloat);

> +CREATE FUNCTION pgstatbloat(IN reloid regclass,
> + OUT table_len BIGINT, -- physical table length in bytes
> + OUT scanned_percent FLOAT8, -- what percentage of the table's pages was scanned
> + OUT approx_tuple_count BIGINT, -- estimated number of live tuples
> + OUT approx_tuple_len BIGINT, -- estimated total length in bytes of live tuples
> + OUT approx_tuple_percent FLOAT8, -- live tuples in % (based on estimate)
> + OUT dead_tuple_count BIGINT, -- exact number of dead tuples
> + OUT dead_tuple_len BIGINT, -- exact total length in bytes of dead tuples
> + OUT dead_tuple_percent FLOAT8, -- dead tuples in % (based on estimate)
> + OUT free_space BIGINT, -- exact free space in bytes
> + OUT free_percent FLOAT8) -- free space in %

Hm. I do wonder if this should really be called 'statbloat'. Perhaps
it'd more appropriately be called pg_estimate_bloat or somesuch?

Also, is free_space really exact? The fsm isn't byte exact.

> +static Datum
> +build_output_type(pgstatbloat_output_type *stat, FunctionCallInfo fcinfo)
> +{
> +#define NCOLUMNS 10
> +#define NCHARS 32
> +
> + HeapTuple tuple;
> + char *values[NCOLUMNS];
> + char values_buf[NCOLUMNS][NCHARS];
> + int i;
> + double tuple_percent;
> + double dead_tuple_percent;
> + double free_percent; /* free/reusable space in % */
> + double scanned_percent;
> + TupleDesc tupdesc;
> + AttInMetadata *attinmeta;
> +
> + /* Build a tuple descriptor for our result type */
> + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
> + elog(ERROR, "return type must be a row type");
> +
> + /*
> + * Generate attribute metadata needed later to produce tuples from raw C
> + * strings
> + */
> + attinmeta = TupleDescGetAttInMetadata(tupdesc);
> +
> + if (stat->table_len == 0)
> + {
> + tuple_percent = 0.0;
> + dead_tuple_percent = 0.0;
> + free_percent = 0.0;
> + }
> + else
> + {
> + tuple_percent = 100.0 * stat->tuple_len / stat->table_len;
> + dead_tuple_percent = 100.0 * stat->dead_tuple_len / stat->table_len;
> + free_percent = 100.0 * stat->free_space / stat->table_len;
> + }
> +
> + scanned_percent = 0.0;
> + if (stat->total_pages != 0)
> + {
> + scanned_percent = 100 * stat->scanned_pages / stat->total_pages;
> + }
> +
> + for (i = 0; i < NCOLUMNS; i++)
> + values[i] = values_buf[i];
> + i = 0;
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->table_len);
> + snprintf(values[i++], NCHARS, "%.2f", scanned_percent);
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_count);
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_len);
> + snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_count);
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_len);
> + snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
> + snprintf(values[i++], NCHARS, INT64_FORMAT, stat->free_space);
> + snprintf(values[i++], NCHARS, "%.2f", free_percent);
> +
> + tuple = BuildTupleFromCStrings(attinmeta, values);
> +
> + return HeapTupleGetDatum(tuple);
> +}

Why go through C strings? I personally think we really shouldn't add
more callers to BuildTupleFromCStrings, it's such an absurd interface.

> + switch (rel->rd_rel->relkind)
> + {
> + case RELKIND_RELATION:
> + case RELKIND_MATVIEW:
> + PG_RETURN_DATUM(pgstatbloat_heap(rel, fcinfo));
> + case RELKIND_TOASTVALUE:
> + err = "toast value";
> + break;
> + case RELKIND_SEQUENCE:
> + err = "sequence";
> + break;
> + case RELKIND_INDEX:
> + err = "index";
> + break;
> + case RELKIND_VIEW:
> + err = "view";
> + break;
> + case RELKIND_COMPOSITE_TYPE:
> + err = "composite type";
> + break;
> + case RELKIND_FOREIGN_TABLE:
> + err = "foreign table";
> + break;
> + default:
> + err = "unknown";
> + break;
> + }
>

This breaks translateability. I'm not sure that's worthy of concern. I
think it'd actually be fine to just say that the relation has to be a
table or matview.

> +static Datum
> +pgstatbloat_heap(Relation rel, FunctionCallInfo fcinfo)
> +{
> + /*
> + * We use the visibility map to skip over SKIP_PAGES_THRESHOLD or
> + * more contiguous all-visible pages. See the explanation in
> + * lazy_scan_heap for the rationale.
> + */

I don't believe that rationale is really true these days. I'd much
rather just rip this out here than copy the rather complex logic.

> + for (blkno = 0; blkno < nblocks; blkno++)
> + {

> + stat.free_space += PageGetHeapFreeSpace(page);
> +
> + if (PageIsNew(page) || PageIsEmpty(page))
> + {
> + UnlockReleaseBuffer(buf);
> + continue;
> + }

I haven't checked, but I'm not sure that it's safe/meaningful to call
PageGetHeapFreeSpace() on a new page.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-05-09 01:24:41 Re: Async execution of postgres_fdw.
Previous Message Andres Freund 2015-05-08 23:53:06 Re: initdb -S and tablespaces