Re: Is a better way to have the same result of this query?

From: Vernon Wu <vernonw(at)gatewaytech(dot)com>
To: Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Is a better way to have the same result of this query?
Date: 2002-12-05 18:44:34
Message-ID: PKIGWVKHYUZT63MHQM42A01BTSWUED.3def9e92@kimiko
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jochem,

Thanks for your suggestion/information.

The followings are the analyise outcomes after I did some modifications with the query. My finding is that the new query
does improve the performance according to the plan. The actual time is reversed might due to the fact the test data is
very small (the machine is a very old one by the way). The userid is the key for all tables and the gender is indexed. Do I
also index the country and province to improve the preformance?

The modified query with the suggested flatting query.

Nested Loop (cost=0.00..91.97 rows=995 width=445) (actual time=1.00..3.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..62.02 rows=1 width=445) (actual time=1.00..3.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..34.68 rows=1 width=378) (actual time=1.00..3.00 rows=3 loops=1)
-> Nested Loop (cost=0.00..29.84 rows=1 width=366) (actual time=1.00..3.00 rows=3 loops=1)
-> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1)
-> Index Scan using pk_preference on preference f (cost=0.00..4.82 rows=1 width=77) (actual time=
0.67..1.00 rows=1 loops=3)
-> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=0.00..0.00 rows=
1 loops=3)
-> Index Scan using pk_profile on profile p (cost=0.00..27.33 rows=1 width=67) (actual time=0.00..0.00 rows=1
loops=3)
SubPlan
-> Materialize (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2)
-> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
-> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 5.00 msec

After replacing "p.userid NOT IN" with "NOT EXISTS":

Result (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1)
InitPlan
-> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
-> Nested Loop (cost=0.00..61.56 rows=995 width=445) (actual time=3.00..4.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..31.61 rows=1 width=445) (actual time=3.00..4.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..26.77 rows=1 width=433) (actual time=2.00..3.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..21.93 rows=1 width=356) (actual time=2.00..2.00 rows=2 loops=1)
-> Index Scan using profile_sex_idx on profile p (cost=0.00..17.09 rows=1 width=67) (actual time=
1.00..1.00 rows=2 loops=1)
-> Index Scan using pk_account on account a (cost=0.00..4.83 rows=1 width=289) (actual time=
0.50..0.50 rows=1 loops=2)
-> Index Scan using pk_preference on preference f (cost=0.00..4.82 rows=1 width=77) (actual time=
0.50..0.50 rows=1 loops=2)
-> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=0.50..0.50 rows=
1 loops=2)
-> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 7.00 msec

After vacuum analyze:

Result (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1)
InitPlan
-> Seq Scan on block b (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1)
-> Nested Loop (cost=3.19..5.29 rows=1 width=91) (actual time=3.00..4.00 rows=2 loops=1)
-> Hash Join (cost=3.19..4.27 rows=1 width=91) (actual time=3.00..3.00 rows=2 loops=1)
-> Hash Join (cost=2.13..3.20 rows=1 width=72) (actual time=2.00..2.00 rows=3 loops=1)
-> Seq Scan on account a (cost=0.00..1.04 rows=3 width=31) (actual time=1.00..1.00 rows=3 loops=1)
-> Hash (cost=2.13..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=0 loops=1)
-> Nested Loop (cost=0.00..2.13 rows=1 width=41) (actual time=0.00..0.00 rows=3 loops=1)
-> Seq Scan on profile p1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.00..0.00 rows=1
loops=1)
-> Seq Scan on preference f (cost=0.00..1.03 rows=3 width=29) (actual time=0.00..0.00 rows=3
loops=1)
-> Hash (cost=1.05..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=0 loops=1)
-> Seq Scan on profile p (cost=0.00..1.05 rows=2 width=19) (actual time=1.00..1.00 rows=2 loops=1)
-> Seq Scan on block (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 7.00 msec

The original query

Nested Loop (cost=0.00..127.12 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..97.17 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1)
-> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1)
-> Index Scan using pk_profile on profile p (cost=0.00..72.16 rows=1 width=605) (actual time=0.33..0.33 rows=1
loops=3)
SubPlan
-> Materialize (cost=22.50..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=2)
-> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
-> Materialize (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2)
-> Nested Loop (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..0.00 rows=3 loops=1)
-> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=
0.00..0.00 rows=1 loops=1)
-> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77) (actual time=0.00..0.00 rows=
3 loops=1)
-> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 4.00 msec

After replacing "p.userid NOT IN" with "NOT EXISTS":

Result (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..2.00 rows=2 loops=1)
InitPlan
-> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55) (actual time=0.00..0.00 rows=0 loops=1)
-> Nested Loop (cost=0.00..104.62 rows=995 width=894) (actual time=1.00..1.00 rows=2 loops=1)
-> Nested Loop (cost=0.00..74.67 rows=1 width=894) (actual time=1.00..1.00 rows=2 loops=1)
-> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289) (actual time=0.00..0.00 rows=3 loops=1)
-> Index Scan using pk_profile on profile p (cost=0.00..49.66 rows=1 width=605) (actual time=0.33..0.33
rows=1 loops=3)
SubPlan
-> Materialize (cost=44.82..44.82 rows=111 width=89) (actual time=0.50..0.50 rows=1 loops=2)
-> Nested Loop (cost=0.00..44.82 rows=111 width=89) (actual time=0.00..1.00 rows=3 loops=1)
-> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12) (actual time=
0.00..0.00 rows=1 loops=1)
-> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77) (actual time=0.00..1.00
rows=3 loops=1)
-> Seq Scan on block (cost=0.00..20.00 rows=995 width=0) (actual time=0.00..0.00 rows=1 loops=2)
Total runtime: 4.00 msec

After vacuum analyze:

Result (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
InitPlan
-> Seq Scan on block b (cost=0.00..1.01 rows=1 width=7) (actual time=0.00..0.00 rows=0 loops=1)
-> Nested Loop (cost=7.30..9.39 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
-> Seq Scan on block (cost=0.00..1.01 rows=1 width=0) (actual time=0.00..0.00 rows=1 loops=1)
-> Materialize (cost=8.37..8.37 rows=1 width=63) (actual time=3.00..3.00 rows=2 loops=1)
-> Hash Join (cost=7.30..8.37 rows=1 width=63) (actual time=2.00..3.00 rows=2 loops=1)
-> Seq Scan on account a (cost=0.00..1.04 rows=3 width=31) (actual time=0.00..0.00 rows=3 loops=1)
-> Hash (cost=7.30..7.30 rows=1 width=32) (actual time=2.00..2.00 rows=0 loops=1)
-> Index Scan using profile_sex_idx on profile p (cost=0.00..7.30 rows=1 width=32) (actual time=
2.00..2.00 rows=2 loops=1)
SubPlan
-> Materialize (cost=2.13..2.13 rows=1 width=41) (actual time=0.50..0.50 rows=1 loops=2)
-> Nested Loop (cost=0.00..2.13 rows=1 width=41) (actual time=1.00..1.00 rows=3 loops=1)
-> Seq Scan on profile p1 (cost=0.00..1.04 rows=1 width=12) (actual time=0.00..0.00
rows=1 loops=1)
-> Seq Scan on preference f (cost=0.00..1.03 rows=3 width=29) (actual time=0.00..0.00
rows=3 loops=1)
Total runtime: 4.00 msec

12/5/2002 2:01:15 AM, Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl> wrote:

>Vernon Wu wrote:
>>
>> SELECT p.userid, p.year, a.country, a.province, a.city
>> FROM profile p, account a
>> WHERE p.userid=a.userid AND
>> (p.year BETWEEN 1961 AND 1976) AND
>> a.country='CA' AND
>> a.province='BC' AND
>> p.gender='f' AND
>> p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
>> block.userid IS NOT NULL AND
>> p.userid IN
>> (SELECT f.userid FROM preference f, profile p1 WHERE p1.userid='Joe' AND 2002-p1.year BETWEEN
>> f.minage AND f.maxage)
>
>You might want to flatten this into more joins and less subqueries,
>especially since you are using IN which is not very optimized:
>
>SELECT p.userid, p.year, a.country, a.province, a.city
>FROM profile p, account a, preference f, profile p1
>WHERE
> f.userid = p.userid AND
> p.userid=a.userid AND
> (p.year BETWEEN 1961 AND 1976) AND
> a.country='CA' AND
> a.province='BC' AND
> p.gender='f' AND
> p.userid NOT IN (SELECT b.userid FROM block b WHERE b.personid='Joe') AND
> block.userid IS NOT NULL AND
> p1.userid='Joe' AND
> 2002-p1.year BETWEEN f.minage AND f.maxage
>
>Also, I am not sure about the NOT IN. If you can rewrite it using EXISTS
>try that, it might be faster.
>
>
>> Nested Loop (cost=0.00..127.12 rows=995 width=894)
>> -> Nested Loop (cost=0.00..97.17 rows=1 width=894)
>> -> Seq Scan on account a (cost=0.00..25.00 rows=1 width=289)
>> -> Index Scan using pk_profile on profile p (cost=0.00..72.16 rows=1 width=605)
>> SubPlan
>> -> Materialize (cost=22.50..22.50 rows=5 width=55)
>> -> Seq Scan on block b (cost=0.00..22.50 rows=5 width=55
>> )
>> -> Materialize (cost=44.82..44.82 rows=111 width=89)
>> -> Nested Loop (cost=0.00..44.82 rows=111 width=89)
>> -> Index Scan using pk_profile on profile p1 (cost=0.00..4.82 rows=1 width=12)
>> -> Seq Scan on preference f (cost=0.00..20.00 rows=1000 width=77)
>
>rows=1000 usually indicates you didn't vacuum analyze. Did you?
>
>> -> Seq Scan on block (cost=0.00..20.00 rows=995 width=0)
>
>And to add to Vernons questions: if you are using PostgreSQL 7.2 or
>later, please send us the EXPLAIN ANALYZE output.
>
>Jochem
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vernon Wu 2002-12-05 19:08:17 Re: Is a better way to have the same result of this
Previous Message Stephan Szabo 2002-12-05 16:47:29 Re: Index question with LIKE keyword