Re: Improve output of BitmapAnd EXPLAIN ANALYZE

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve output of BitmapAnd EXPLAIN ANALYZE
Date: 2016-10-21 17:30:10
Message-ID: 20161021173010.GC13284@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Jim Nasby (Jim(dot)Nasby(at)BlueTreble(dot)com) wrote:
> On 10/21/16 8:21 AM, Stephen Frost wrote:
> >Counting each page as the relation's average number of tuples per page
> >seems entirely reasonable to me, for what that is trying to report.
>
> My concern is that still leaves a lot of room for confusion when
> interpreting EXPLAIN ANALYZE. Every other node will tell you exactly
> what happened and it's pretty easy to reason about whether rows
> should have gone up or down based on the type of node. You can't do
> that for Bitmap(And|Or) unless you know the details of how
> TIDBitmaps work. Reporting N/A makes it crystal clear that these
> nodes operate very differently than all the others.

I don't see why you think the numbers reported by BitmapAnd based on
this approach wouldn't go up and down in a similar manner to what you
would expect to get, based on that node type. Reporting N/A is entirely
punting on it when we have perfectly useful information that can be
reported.

> (On a related note, it would also be nice if we reported fractional
> rows when the row count low and loops is high.)

I can certainly understand that, though I think I'd rather have an
actual 'total' value or similar instead, but that's really a different
discussion.

> >That said, I'm a big fan of how we have more detail for things like a
> >HashJoin (buckets, batches, memory usage) and it might be nice to have
> >more information like that for a BitmapAnd (and friends). In
> >particular, I'm thinking of memory usage, exact vs. lossy pages, etc.
> >Knowing that the bitmap has gotten to the point of being lossy might
> >indicate that a user could up work_mem, for example, and possibly avoid
> >recheck costs.
>
> I think that's the best way to handle this: report N/A in the header
> and then provide details on exact vs lossy. That provides a clear
> indication to users that these kinds of nodes are special, as well
> as a reminder as to why they're special. Certainly the node could
> report an exact rowcount in the header if there were no lossy pages
> too.

I don't see why we would want to stick 'N/A' in for the header, even if
we are reporting the details, when we can provide a pretty reasonable
number. In particular, I certainly don't think we would want to report
N/A sometimes (lossy case) and then an actual number other times (all
exact case). That strikes me as much more likely to be confusing.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-10-21 17:38:58 Re: PSA: Systemd will kill PostgreSQL
Previous Message Robert Haas 2016-10-21 17:29:42 Re: Default setting for autovacuum_freeze_max_age