Skip site navigation (1) Skip section navigation (2)

Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, casey(dot)shobe(at)messagesystems(dot)com
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Date: 2012-01-12 00:53:02
Message-ID: 201201120153.02587.andres@anarazel.de (view raw or flat)
Thread:
Lists: pgsql-bugs
On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote:
>  (In cases where we know it's unique, something like this
> could be reasonable, but I believe get_variable_numdistinct already
> accounts for such cases.)
One of those case which looks relatively easy is that CTEs currently work as a 
kind of 'statistics barrier' here. Specifically I wonder why:

test_raptelan=# EXPLAIN WITH foo AS (SELECT * FROM b WHERE id < 5000) SELECT * 
FROM a WHERE a.id IN (SELECT id FROM foo);
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Nested Loop  (cost=302.02..1876.30 rows=2550000 width=11)
   CTE foo
     ->  Index Scan using b_pkey on b  (cost=0.00..184.06 rows=5243 width=10)
           Index Cond: (id < 5000)
   ->  HashAggregate  (cost=117.97..119.97 rows=200 width=4)
         ->  CTE Scan on foo  (cost=0.00..104.86 rows=5243 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..7.85 rows=1 width=11)
         Index Cond: (id = foo.id)


plans differently than

test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT id FROM b WHERE 
id < 5000 OFFSET 0);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=560.41..17426.03 rows=5243 width=11)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..160013.81 rows=5100000 
width=11)
   ->  Sort  (cost=560.40..573.51 rows=5243 width=4)
         Sort Key: b.id
         ->  Limit  (cost=0.00..184.06 rows=5243 width=4)
               ->  Index Only Scan using b_pkey on b  (cost=0.00..184.06 
rows=5243 width=4)
                     Index Cond: (id < 5000)


Couldn't the CTE pass a vardata from inside to the outside?

Andres

In response to

pgsql-bugs by date

Next:From: Tom LaneDate: 2012-01-12 01:24:44
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Previous:From: Andres FreundDate: 2012-01-12 00:40:34
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group