Re: a fast bloat measurement tool (was Re: Measuring relation free space)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date: 2015-03-04 06:10:19
Message-ID: CAA4eK1KyWQ+pOYvCb+hVEcYtOBBmofUWKZ54CTF93sZB0qyHGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 26, 2014 at 1:08 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
wrote:
>
> At 2014-09-25 15:40:11 +0530, ams(at)2ndQuadrant(dot)com wrote:
> >
> > All right, then I'll post a version that addresses Amit's other
> > points, adds a new file/function to pgstattuple, acquires content
> > locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all.
>
> Sorry, I forgot to post this patch. It does what I listed above, and
> seems to work fine (it's still quite a lot faster than pgstattuple
> in many cases).
>

I think one of the comment is not handled in latest patch.
5. Don't we need the handling for empty page (PageIsEmpty()) case?

I think we should have siimilar for PageIsEmpty()
as you have done for PageIsNew() in your patch.

+ stat.tuple_count = vac_estimate_reltuples(rel, false, nblocks, scanned,
+ stat.tuple_count);

Here for scanned tuples, you have used the live tuple counter
whereas I think we need to count HEAPTUPLE_INSERT_IN_PROGRESS
and HEAPTUPLE_DELETE_IN_PROGRESS and
HEAPTUPLE_RECENTLY_DEAD.

Refer the similar logic in Vacuum.

Although I am not in favor of using HeapTupleSatisfiesVacuum(), however
if you want to use it then lets be consistent with what Vacuum does
for estimation of tuples.

> A couple of remaining questions:
>
> 1. I didn't change the handling of LP_DEAD items, because the way it is
> now matches what pgstattuple counts. I'm open to changing it, though.
> Maybe it makes sense to count LP_DEAD items iff lp_len != 0? Or just
> leave it as it is? I think it doesn't matter much.
>
> 2. I changed the code to acquire the content locks on the buffer, as
> discussed, but it still uses HeapTupleSatisfiesVacuum. Amit suggested
> using HeapTupleSatisfiesVisibility, but it's not clear to me why that
> would be better. I welcome advice in this matter.
>

The reason to use HeapTupleSatisfiesVacuum() is that it helps us
in freezing and some other similar stuff which is required by
Vacuum. Also it could be beneficial if we want take specific
action for various result codes of Vaccum. I think as this routine
mostly gives the estimated count, so it might not matter much even
if we use HeapTupleSatisfiesVacuum(), however it is better to be
consistent with pgstat_heap() in this case.
Do you have any reason for using HeapTupleSatisfiesVacuum() rather
than what is used in pgstat_heap()?

> (If anything, I should use HeapTupleIsSurelyDead, which doesn't set
> any hint bits, but which I earlier rejected as too conservative in
> its dead/not-dead decisions for this purpose.)
>
> (I've actually patched the pgstattuple.sgml docs as well, but I want to
> re-read that to make sure it's up to date, and didn't want to wait to
> post the code changes.)
>

Sure, post the same when it is ready.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-03-04 06:23:16 Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Previous Message Michael Paquier 2015-03-04 05:03:59 Re: Bug in pg_dump