Re: Subquery in a JOIN not getting restricted?

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Subquery in a JOIN not getting restricted?
Date: 2011-11-12 09:28:45
Message-ID: 4EBE3C4D.9010208@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/11/11 09:39, Jay Levitt wrote:
> Kevin Grittner wrote:
>> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> wrote:
>>
>>> I don't get why the GROUP BY in this subquery forces it to scan
>>> the entire users table (seq scan here, index scan on a larger
>>> table) when there's only one row in users that can match:
>
>> Are you sure there's a plan significantly faster than 1.3 ms?
>
> Yep! Watch this:
>
> drop schema if exists jaytest cascade;
> create schema jaytest;
> set search_path to jaytest;
>
> create table questions (
> id int not null primary key,
> user_id int not null
> );
> insert into questions
> select generate_series(1,1100), (random()*2000000)::int;
>
> create table users (
> id int not null primary key
> );
> insert into users select generate_series(1, 2000000);
>
> vacuum freeze analyze;
>
> explain analyze
> select questions.id
> from questions
> join (
> select u.id
> from users as u
> group by u.id
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> -----------------------
> Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual
> time=888.787..888.790 rows=1 loops=1)
> Merge Cond: (u.id = questions.user_id)
> -> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual
> time=0.017..735.509 rows=1747305 loops=1)
> -> Index Scan using users_pkey on users u
> (cost=0.00..60797.47 rows=2000000 width=4) (actual time=0.015..331.990
> rows=1747305 loops=1)
> -> Materialize (cost=8.28..8.29 rows=1 width=8) (actual
> time=0.013..0.015 rows=1 loops=1)
> -> Sort (cost=8.28..8.28 rows=1 width=8) (actual
> time=0.012..0.013 rows=1 loops=1)
> Sort Key: questions.user_id
> Sort Method: quicksort Memory: 25kB
> -> Index Scan using questions_pkey on questions
> (cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1
> loops=1)
> Index Cond: (id = 1)
> Total runtime: 888.832 ms
> (11 rows)
>
> explain analyze
> select questions.id
> from questions
> join (
> select u.id
> from users as u
> ) as s
> on s.id = questions.user_id
> where questions.id = 1;
>
> -----------------------
> Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual
> time=0.019..0.021 rows=1 loops=1)
> -> Index Scan using questions_pkey on questions (cost=0.00..8.27
> rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
> Index Cond: (id = 1)
> -> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1
> width=4) (actual time=0.007..0.007 rows=1 loops=1)
> Index Cond: (u.id = questions.user_id)
> Total runtime: 0.045 ms
> (6 rows)
>
>> That said, there might be some room for an optimization which pushes
>> that test into the query with the "group by" clause. I don't know
>> if there's a problem with that which I'm missing, the construct was
>> judged to be too rare to be worth the cost of testing for it, or
>> it's just that nobody has yet gotten to it.
>
> Anyone have more insights on whether this is hard to optimize or
> simply not-yet-optimized? And if the latter, where might I start
> looking? (Not that you -really- want me to submit a patch; my C has
> regressed to the "try an ampersand. OK, try an asterisk." level...)
>
> Jay
>
Minor note:

'PRIMARY KEY' gives you a 'NOT NULL' constraint for free.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2011-11-13 08:57:31 Re: Heavy contgnous load
Previous Message Greg Smith 2011-11-12 06:31:16 Re: WAL partition filling up after high WAL activity