Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, casey(dot)shobe(at)messagesystems(dot)com
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Date: 2012-01-12 00:53:02
Message-ID: 201201120153.02587.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote:
> (In cases where we know it's unique, something like this
> could be reasonable, but I believe get_variable_numdistinct already
> accounts for such cases.)
One of those case which looks relatively easy is that CTEs currently work as a
kind of 'statistics barrier' here. Specifically I wonder why:

test_raptelan=# EXPLAIN WITH foo AS (SELECT * FROM b WHERE id < 5000) SELECT *
FROM a WHERE a.id IN (SELECT id FROM foo);
QUERY PLAN
------------------------------------------------------------------------------
Nested Loop (cost=302.02..1876.30 rows=2550000 width=11)
CTE foo
-> Index Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=10)
Index Cond: (id < 5000)
-> HashAggregate (cost=117.97..119.97 rows=200 width=4)
-> CTE Scan on foo (cost=0.00..104.86 rows=5243 width=4)
-> Index Scan using a_pkey on a (cost=0.00..7.85 rows=1 width=11)
Index Cond: (id = foo.id)

plans differently than

test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT id FROM b WHERE
id < 5000 OFFSET 0);
QUERY PLAN
--------------------------------------------------------------------------------------------
Merge Semi Join (cost=560.41..17426.03 rows=5243 width=11)
Merge Cond: (a.id = b.id)
-> Index Scan using a_pkey on a (cost=0.00..160013.81 rows=5100000
width=11)
-> Sort (cost=560.40..573.51 rows=5243 width=4)
Sort Key: b.id
-> Limit (cost=0.00..184.06 rows=5243 width=4)
-> Index Only Scan using b_pkey on b (cost=0.00..184.06
rows=5243 width=4)
Index Cond: (id < 5000)

Couldn't the CTE pass a vardata from inside to the outside?

Andres

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2012-01-12 01:24:44 Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Previous Message Andres Freund 2012-01-12 00:40:34 Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct