Re: EXPLAIN and nfiltered, take two

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)2ndquadrant(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: EXPLAIN and nfiltered, take two
Date: 2011-09-21 21:49:21
Message-ID: 465.1316641761@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 20, 2011 at 8:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I find it useless and probably confusing to put out "Rows Removed
>> by Recheck Cond: 0" unless we're dealing with a lossy index.

> I don't really see the point of this. I think printing it always is
> both clear and appropriate; it would be even nicer if we also had a
> line for "Rows Rechecked".

I think you'll be singing a different tune as soon as you find out that
every single indexscan in your plans has a generally-useless "Rows
Removed" line added to it. The patch as I have it now is doing this:

regression=# explain analyze select * from tenk1 where unique1 = 42;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..8.27 rows=1 width=244) (actual time=0.137..0.153 rows=1 loops=1)
Index Cond: (unique1 = 42)
Rows Removed by Index Cond: 0
Total runtime: 0.635 ms
(4 rows)

and personally I object to a 50% increase in the vertical space occupied
by a plan item in order to tell me not a damn thing. For an ordinary
btree indexscan, that line WILL NOT EVER be nonzero. I do not own
monitors large enough to accept that sort of overhead without complaint.

Now the entry *is* useful when considering a lossy indexscan. I just
don't want it there otherwise. (Note that I'm not proposing suppressing
such entries from the machine-readable formats, just text format.)

> I think we are getting to the point where EXPLAIN is complex enough
> that it should really have its own chapter in the documentation.

Yeah, the lack of documentation in the submitted patch is another big
problem with it, but it's not especially relevant to arguing about what
the behavior should be. I'll see about documentation after I'm done
with the code.

>> Also, upthread it was argued that we shouldn't measure the effects of
>> joinqual filtering. I don't buy this for a minute, ...

> Yep, I agree. We should measure everything we possibly can. I don't
> have a clear idea how this ought to be represented, but leaving it out
> doesn't seem like the right answer.

After playing with it for awhile, I'm inclined to just put out the
measured count and let the user interpret it. Here are some examples
using table tenk1 from the regression database, with most of the
planner's enable_ flags turned off to force it to produce the same shape
of plan for each case:

regression=# explain analyze select * from tenk1 a join tenk1 b on a.thousand = b.hundred ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..3172.38 rows=100000 width=488) (actual time=0.449..1678.464 rows=100000 loops=1)
Merge Cond: (b.hundred = a.thousand)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=244) (actual time=0.228..117.637 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.20 rows=10000 width=244) (actual time=0.154..431.251 rows=100001 loops=1)
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.115..12.033 rows=1001 loops=1)
Total runtime: 2057.743 ms
(6 rows)

regression=# explain analyze select * from tenk1 a join tenk1 b on a.thousand = b.hundred and a.four = b.ten;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..3422.38 rows=10000 width=488) (actual time=0.657..1314.394 rows=20000 loops=1)
Merge Cond: (b.hundred = a.thousand)
Join Filter: (a.four = b.ten)
Rows Removed by Join Filter: 80000
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=244) (actual time=0.395..110.506 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.20 rows=10000 width=244) (actual time=0.165..390.867 rows=100001 loops=1)
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.121..11.680 rows=1001 loops=1)
Total runtime: 1386.446 ms
(8 rows)

OK, that's pretty straightforward, what about a left join?

regression=# explain analyze select * from tenk1 a left join tenk1 b on a.thousand = b.hundred ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.00..4929.36 rows=100000 width=488) (actual time=0.469..2097.446 rows=109000 loops=1)
Merge Cond: (a.thousand = b.hundred)
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.164..141.034 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=244) (actual time=0.234..642.645 rows=99991 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=244) (actual time=0.191..137.716 rows=10000 loops=1)
Total runtime: 2549.644 ms
(6 rows)

regression=# explain analyze select * from tenk1 a left join tenk1 b on a.thousand = b.hundred and a.four = b.ten;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Left Join (cost=0.00..5179.36 rows=10000 width=488) (actual time=0.532..1869.319 rows=29800 loops=1)
Merge Cond: (a.thousand = b.hundred)
Join Filter: (a.four = b.ten)
Rows Removed by Join Filter: 80000
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.160..147.302 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=244) (actual time=0.274..672.431 rows=99991 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=244) (actual time=0.233..141.353 rows=10000 loops=1)
Total runtime: 2012.329 ms
(8 rows)

In this case there are about 800 more rows out than you might expect,
corresponding to left-hand rows that had null-extended rows emitted
for them rather than just going away entirely. That happens after the
"join filter" so the rows-removed count doesn't change.

Here's a semi join:

regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.thousand = b.hundred );
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Merge Semi Join (cost=0.00..3374.88 rows=1000 width=244) (actual time=0.460..301.648 rows=1000 loops=1)
Merge Cond: (a.thousand = b.hundred)
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.156..11.762 rows=1001 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=4) (actual time=0.231..208.458 rows=10000 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=4) (actual time=0.188..111.263 rows=10000 loops=1)
Total runtime: 306.712 ms
(6 rows)

regression=# explain analyze select * from tenk1 a where exists(select 1 from tenk1 b where a.thousand = b.hundred and a.four = b.ten);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Merge Semi Join (cost=0.00..3624.88 rows=1000 width=244) (actual time=0.486..1035.129 rows=200 loops=1)
Merge Cond: (a.thousand = b.hundred)
Join Filter: (a.four = b.ten)
Rows Removed by Join Filter: 80000
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.164..11.507 rows=1001 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=8) (actual time=0.232..477.475 rows=81991 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=8) (actual time=0.191..112.566 rows=10000 loops=1)
Total runtime: 1037.360 ms
(8 rows)

This is where somebody might get confused --- if there's only 1000 rows
emitted by the non-filtered version, how did 80000 rows get removed in
the filtered one? But still, hiding that information isn't terribly
helpful IMO, because if you were wondering why the mergejoin node took
4X longer to emit 5X fewer rows than before, the behavior of the filter
would be a valuable clue.

Last, the antijoin equivalent:

regression=# explain analyze select * from tenk1 a where not exists(select 1 from tenk1 b where a.thousand = b.hundred );
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=0.00..4929.36 rows=9000 width=244) (actual time=296.501..481.179 rows=9000 loops=1)
Merge Cond: (a.thousand = b.hundred)
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.150..112.933 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=4) (actual time=0.228..209.522 rows=10000 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=4) (actual time=0.188..111.769 rows=10000 loops=1)
Total runtime: 514.568 ms
(6 rows)

regression=# explain analyze select * from tenk1 a where not exists(select 1 from tenk1 b where a.thousand = b.hundred and a.four = b.ten);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Merge Anti Join (cost=0.00..5179.36 rows=9000 width=244) (actual time=16.105..1266.661 rows=9800 loops=1)
Merge Cond: (a.thousand = b.hundred)
Join Filter: (a.four = b.ten)
Rows Removed by Join Filter: 80000
-> Index Scan using tenk1_thous_tenthous on tenk1 a (cost=0.00..1702.20 rows=10000 width=244) (actual time=0.157..113.719 rows=10000 loops=1)
-> Materialize (cost=0.00..1727.16 rows=10000 width=8) (actual time=0.231..497.882 rows=81991 loops=1)
-> Index Scan using tenk1_hundred on tenk1 b (cost=0.00..1702.16 rows=10000 width=8) (actual time=0.189..115.980 rows=10000 loops=1)
Total runtime: 1302.973 ms
(8 rows)

Again, the large number of removed rows does provide a useful
performance clue, even if it seems like the numbers don't add up.

It's possible that we could make the LEFT JOIN examples less confusing
if we were to add an additional counter (and display row) that shows the
number of null-extended rows added by the join mechanism. However, I
don't immediately see how to apply that idea to semi/anti joins.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2011-09-21 21:56:25 Re: Double sorting split patch
Previous Message Kevin Grittner 2011-09-21 21:07:51 Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs)