Skip site navigation (1) Skip section navigation (2)

Re: Subquery in a JOIN not getting restricted?

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Subquery in a JOIN not getting restricted?
Date: 2011-11-09 20:39:14
Message-ID: 4EBAE4F2.1090403@gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
Kevin Grittner wrote:
> Jay Levitt<jay(dot)levitt(at)gmail(dot)com>  wrote:
>
>> I don't get why the GROUP BY in this subquery forces it to scan
>> the entire users table (seq scan here, index scan on a larger
>> table) when there's only one row in users that can match:

> Are you sure there's a plan significantly faster than 1.3 ms?

Yep!  Watch this:

drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;

create table questions (
   id int not null primary key,
   user_id int not null
);
insert into questions
   select generate_series(1,1100), (random()*2000000)::int;

create table users (
   id int not null primary key
);
insert into users select generate_series(1, 2000000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
   select u.id
   from users as u
   group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
  Merge Join  (cost=8.28..90833.02 rows=1818 width=4) (actual 
time=888.787..888.790 rows=1 loops=1)
    Merge Cond: (u.id = questions.user_id)
    ->  Group  (cost=0.00..65797.47 rows=2000000 width=4) (actual 
time=0.017..735.509 rows=1747305 loops=1)
          ->  Index Scan using users_pkey on users u  (cost=0.00..60797.47 
rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1)
    ->  Materialize  (cost=8.28..8.29 rows=1 width=8) (actual 
time=0.013..0.015 rows=1 loops=1)
          ->  Sort  (cost=8.28..8.28 rows=1 width=8) (actual 
time=0.012..0.013 rows=1 loops=1)
                Sort Key: questions.user_id
                Sort Method:  quicksort  Memory: 25kB
                ->  Index Scan using questions_pkey on questions 
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                      Index Cond: (id = 1)
  Total runtime: 888.832 ms
(11 rows)

explain analyze
select questions.id
from questions
join (
   select u.id
   from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
  Nested Loop  (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021 
rows=1 loops=1)
    ->  Index Scan using questions_pkey on questions  (cost=0.00..8.27 
rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
          Index Cond: (id = 1)
    ->  Index Scan using users_pkey on users u  (cost=0.00..8.49 rows=1 
width=4) (actual time=0.007..0.007 rows=1 loops=1)
          Index Cond: (u.id = questions.user_id)
  Total runtime: 0.045 ms
(6 rows)

> That said, there might be some room for an optimization which pushes
> that test into the query with the "group by" clause.  I don't know
> if there's a problem with that which I'm missing, the construct was
> judged to be too rare to be worth the cost of testing for it, or
> it's just that nobody has yet gotten to it.

Anyone have more insights on whether this is hard to optimize or simply 
not-yet-optimized?  And if the latter, where might I start looking? (Not 
that you -really- want me to submit a patch; my C has regressed to the "try 
an ampersand.  OK, try an asterisk." level...)

Jay

In response to

Responses

pgsql-performance by date

Next:From: Sorin DuduiDate: 2011-11-10 13:05:56
Subject: IMMUTABLE STABLE functions, daily updates
Previous:From: Kevin GrittnerDate: 2011-11-09 20:20:00
Subject: Re: Subquery in a JOIN not getting restricted?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group