Re: a fast bloat measurement tool (was Re: Measuring relation free space)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date: 2015-02-23 01:41:03
Message-ID: 54EA852F.8070000@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 28.1.2015 05:03, Abhijit Menon-Sen wrote:
> At 2015-01-27 17:00:27 -0600, Jim(dot)Nasby(at)BlueTreble(dot)com wrote:
>>
>> It would be best to get this into a commit fest so it's not lost.
>
> It's there already.
>
> -- Abhijit
>
>

I looked at this patch today, so a few comments from me:

1) I believe the patch is slightly broken, because the version was
bumped to 1.3 but only partially, so I get this on "make install"

$ make -j9 -s install
/usr/bin/install: cannot stat ‘./pgstattuple--1.3.sql’: No such
file or directory
../../src/makefiles/pgxs.mk:129: recipe for target 'install' failed
make[1]: *** [install] Error 1
Makefile:85: recipe for target 'install-pgstattuple-recurse' failed
make: *** [install-pgstattuple-recurse] Error 2
make: *** Waiting for unfinished jobs....

The problem seems to be that Makefile already lists --1.3.sql in the
DATA section, but the file was not renamed (there's just --1.2.sql).

After fixing that, it's also necessary to fix the version in the
control file, otherwise the CREATE EXTENSION fails.

default_version = '1.2' -> '1.3'

At least that fixed the install for me.

2) Returning Datum from fbstat_heap, which is a static function seems a
bit strange to me, I'd use the HeapTuple directly, but meh ...

3) The way the tuple is built by first turning the values into strings
and then turned back into Datums by calling BuildTupleFromCStrings
is more serious I guess. Why not to just keep the Datums and call
heap_form_tuple directly?

I see the other functions in pgstattuple.c do use the string way, so
maybe there's a reason for that? Or is this just to keep the code
consistent in the extension?

4) I really doubt 'fastbloat' is a good name. I propose 'pgstatbloat'
to make that consistent with the rest of pgstattuple functions. Or
something similar, but 'fastbloat' is just plain confusing.

Otherwise, the code looks OK to me. Now, there are a few features I'd
like to have for production use (to minimize the impact):

1) no index support :-(

I'd like to see support for more relation types (at least btree
indexes). Are there any plans for that? Do we have an idea on how to
compute that?

2) sampling just a portion of the table

For example, being able to sample just 5% of blocks, making it less
obtrusive, especially on huge tables. Interestingly, there's a
TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
of the methods (e.g. functions behind SYSTEM sampling)?

3) throttling

Another feature minimizing impact of running this on production might
be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
or something along those lines.

4) prefetch

fbstat_heap is using visibility map to skip fully-visible pages,
which is nice, but if we skip too many pages it breaks readahead
similarly to bitmap heap scan. I believe this is another place where
effective_io_concurrency (i.e. prefetch) would be appropriate.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-02-23 02:20:17 Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Previous Message Robert Haas 2015-02-23 01:20:44 Re: restrict global access to be readonly