Re: a fast bloat measurement tool (was Re: Measuring relation free space)

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date: 2014-09-24 08:56:37
Message-ID: 20140924085637.GB28254@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit.

Thanks for your comments, and I'm sorry it's taken me so long to
respond.

At 2014-08-03 11:18:57 +0530, amit(dot)kapila16(at)gmail(dot)com wrote:
>
> After looking at code, I also felt that it is better to add this as a
> version of pg_stattuple.

I started off trying to do that, but now I'm afraid I strongly disagree.
First, pg_stattuple works on relations and indexes, whereas fastbloat
only works on relations (because of the VM/FSM use). Second, the code
began to look hideous when mushed together, with too many ifs to avoid
locking I didn't need and so on. The logic is just too different.

> Is this assumption based on the reason that if the visibility map bit
> of page is set, then there is high chance that vacuum would have
> pruned the dead tuples and updated FSM with freespace?

Right. I'm not convinced an explanation of the VM/FSM belongs in the
fastbloat documentation, but I'll see if I can make it clearer.

> 1. compilation errors […]
> I think this is not a proper multi-line comment. […]
> It is better to have CHECK_FOR_INTERRUPTS() in above loop. […]
> Incase of new page don't you need to account for freespace. […]
> 5. Don't we need the handling for empty page (PageIsEmpty()) case?

Thanks, have fixed, will push updates soon.

> What is the reason for not counting ItemIdIsDead() case for
> accounting of dead tuples.

Will reconsider and fix.

> 7.
> HeapTupleSatisfiesVacuumNoHint()
> a. Why can't we use HeapTupleSatisfiesVisibility() with dirty snapshot
> as we use for pgstattuple?

Heavier locking. I tried to make do with the existing HTS* functions,
but fastbloat was noticeably faster in tests with the current setup.

> b. If we want to use HeapTupleSatisfiesVacuum(), why can't we add one
> parameter to function which can avoid setting of hintbit.

This is certainly a possibility, and as you say it would be better in
terms of maintenance. I didn't do it that way because I wanted to keep
the extension self-contained rather than make it depend on core changes.
If there's consensus on adding a nohint parameter, sure, I can do that.
(But the fastbloat won't work on current versions, which would suck.)

In the meantime, I'll merge the later updates from HTSVacuum into my
code. Thanks for the heads-up.

> function header of vac_estimate_reltuples() suggests that it is
> used by VACUUM and ANALYZE, I think it will be better to update
> the comment w.r.t new usage as well.

OK.

> 10. I think it should generate resource file as is done for other
> modules if you want to keep it as a separate module.

Thanks, will do.

> Do you really need to specify examples in README.

I don't *need* to, but I like it.

-- Abhijit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-09-24 09:09:24 Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Previous Message Abhijit Menon-Sen 2014-09-24 08:27:46 Re: pgcrypto: PGP signatures