Re: Subquery in a JOIN not getting restricted?

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Subquery in a JOIN not getting restricted?
Date: 2011-11-09 20:39:14
Message-ID: 4EBAE4F2.1090403@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Kevin Grittner wrote:
> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> wrote:
>
>> I don't get why the GROUP BY in this subquery forces it to scan
>> the entire users table (seq scan here, index scan on a larger
>> table) when there's only one row in users that can match:

> Are you sure there's a plan significantly faster than 1.3 ms?

Yep! Watch this:

drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;

create table questions (
id int not null primary key,
user_id int not null
);
insert into questions
select generate_series(1,1100), (random()*2000000)::int;

create table users (
id int not null primary key
);
insert into users select generate_series(1, 2000000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
select u.id
from users as u
group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual
time=888.787..888.790 rows=1 loops=1)
Merge Cond: (u.id = questions.user_id)
-> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual
time=0.017..735.509 rows=1747305 loops=1)
-> Index Scan using users_pkey on users u (cost=0.00..60797.47
rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1)
-> Materialize (cost=8.28..8.29 rows=1 width=8) (actual
time=0.013..0.015 rows=1 loops=1)
-> Sort (cost=8.28..8.28 rows=1 width=8) (actual
time=0.012..0.013 rows=1 loops=1)
Sort Key: questions.user_id
Sort Method: quicksort Memory: 25kB
-> Index Scan using questions_pkey on questions
(cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
Index Cond: (id = 1)
Total runtime: 888.832 ms
(11 rows)

explain analyze
select questions.id
from questions
join (
select u.id
from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021
rows=1 loops=1)
-> Index Scan using questions_pkey on questions (cost=0.00..8.27
rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
Index Cond: (id = 1)
-> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1
width=4) (actual time=0.007..0.007 rows=1 loops=1)
Index Cond: (u.id = questions.user_id)
Total runtime: 0.045 ms
(6 rows)

> That said, there might be some room for an optimization which pushes
> that test into the query with the "group by" clause. I don't know
> if there's a problem with that which I'm missing, the construct was
> judged to be too rare to be worth the cost of testing for it, or
> it's just that nobody has yet gotten to it.

Anyone have more insights on whether this is hard to optimize or simply
not-yet-optimized? And if the latter, where might I start looking? (Not
that you -really- want me to submit a patch; my C has regressed to the "try
an ampersand. OK, try an asterisk." level...)

Jay

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sorin Dudui 2011-11-10 13:05:56 IMMUTABLE STABLE functions, daily updates
Previous Message Kevin Grittner 2011-11-09 20:20:00 Re: Subquery in a JOIN not getting restricted?