Measuring relation free space

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Measuring relation free space
Date: 2011-11-06 03:08:11
Message-ID: 4EB5FA1B.1090305@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached patch adds a new function to the pageinspect extension for
measuring total free space, in either tables or indexes. It returns the
free space as a percentage, so higher numbers mean more bloat. After
trying a couple of ways to quantify it, I've found this particular
measure correlates well with the nastiest bloat issues I've ran into in
production recently. For example, an index that had swelled to over 5X
its original size due to autovacuum issues registered at 0.86 on this
scale. I could easily see people putting an alert at something like
0.40 and picking candidates to reindex based on it triggering. That
would be about a million times smarter than how I've been muddling
through this class of problems so far.

Code by Jaime Casanova, based on a prototype by me. Thanks to attendees
and sponsors of the PgWest conference for helping to fund some deeper
exploration of this idea.

Here's a test case showing it in action:

create extension pageinspect;
create table t (k serial,v integer);
insert into t(v) (select generate_series(1,100000));
create index t_idx on t(k);
delete from t where k<50000;
vacuum t;

gsmith=# select relation_free_space('t');
relation_free_space
---------------------
0.445466

gsmith=# select relation_free_space('t_idx');
relation_free_space
---------------------
0.550946

Some open questions in my mind:

-Given this is doing a full table scan, should it hook into a ring
buffer to keep from trashing the buffer cache? Or might it loop over
the relation in a different way all together? I was thinking about
eyeing the FSM instead at one point, didn't explore that yet. There's
certainly a few ways to approach this, we just aimed at the easiest way
to get a working starter implementation, and associated results to
compare others against.

-Should there be a non-superuser version of this? We'd certainly need
to get a less cache demolishing version before that would seem wise.

-There were related things in the pageinspect module, but a case could
be made for this being a core function instead. It's a bit more likely
to be used in production than the rest of that extension.

-What if anything related to TOAST should this handle?

We're also planning to do a sampling version of this, using the same
approach ANALYZE does. Grab a number of blocks, extrapolate from
there. It shouldn't take many samples before the accuracy is better
than how people are estimated this now. That work is just waiting on
some better thinking about how to handle the full relation version first.

And, yes, the explanation in the docs and code should be clear that it's
returning a percentage, which I just realized when writing this. At
least I remembered to document something; still ahead of the average new
patch...

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachment Content-Type Size
relation_free_space-v2.patch text/x-patch 7.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2011-11-06 03:24:37 const correctness
Previous Message Peter Geoghegan 2011-11-06 02:16:20 Re: [PATCH] optional cleaning queries stored in pg_stat_statements