Query optimization help

From: Szymon Kosok <szymon(at)mwg(dot)pl>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query optimization help
Date: 2011-08-30 05:41:12
Message-ID: CAOJGGcG1riqgx9Xz0d=gWLVrH_Ef=7raS7tOzEHzdaVRHq_dGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I asked that question on StackOverflow, but didn't get any valuable
response, so I'll ask it here. :)

I have such query:

SELECT "spoleczniak_tablica"."id", "spoleczniak_tablica"."postac_id",
"spoleczniak_tablica"."hash", "spoleczniak_tablica"."typ",
"spoleczniak_tablica"."ikona", "spoleczniak_tablica"."opis",
"spoleczniak_tablica"."cel", "spoleczniak_tablica"."data",
"postac_postacie"."id",
"postac_postacie"."user_id", "postac_postacie"."avatar",
"postac_postacie"."ikonka",
"postac_postacie"."imie", "postac_postacie"."nazwisko",
"postac_postacie"."pseudonim",
"postac_postacie"."plec", "postac_postacie"."wzrost", "postac_postacie"."waga",
"postac_postacie"."ur_tydz", "postac_postacie"."ur_rok",
"postac_postacie"."ur_miasto_id",
"postac_postacie"."akt_miasto_id", "postac_postacie"."kasa",
"postac_postacie"."punkty",
"postac_postacie"."zmeczenie", "postac_postacie"."zdrowie",
"postac_postacie"."kariera"
FROM "spoleczniak_tablica" INNER JOIN "postac_postacie" ON
("spoleczniak_tablica"."postac_id" = "postac_postacie"."id") WHERE
spoleczniak_tablica.postac_id = 1 or spoleczniak_tablica.id in(select
wpis_id from
spoleczniak_oznaczone where etykieta_id in(select tag_id from
spoleczniak_subskrypcje where
postac_id = 1)) or (spoleczniak_tablica.postac_id in(select obserwowany_id from
spoleczniak_obserwatorium where obserwujacy_id = 1) and hash not
in('dyskusja', 'kochanie',
'szturniecie')) or (spoleczniak_tablica.cel = 1 and
spoleczniak_tablica.hash in('dyskusja',
'kochanie', 'obserwatorium', 'szturchniecie')) or spoleczniak_tablica.hash =
'administracja-info' or exists(select 1 from spoleczniak_komentarze
where kredka_id =
spoleczniak_tablica.id and postac_id = 1) ORDER BY
"spoleczniak_tablica"."id" DESC LIMIT
21;

and it's real performance bottleneck for us. It's one of the most
often executed query on our site.

Here is EXPLAIN ANALYZE:

Limit (cost=52.69..185979.44 rows=21 width=283) (actual
time=5.981..149.110 rows=21 loops=1)
-> Nested Loop (cost=52.69..27867127142.57 rows=3147528
width=283) (actual time=5.981..149.103 rows=21 loops=1)
-> Index Scan Backward using spoleczniak_tablica_pkey on
spoleczniak_tablica (cost=52.69..27866103743.37 rows=3147528
width=194) (actual time=5.971..148.963 rows=21 loops=1)
Filter: ((postac_id = 1) OR (SubPlan 1) OR ((hashed
SubPlan 2) AND ((hash)::text <> ALL
('{dyskusja,kochanie,szturniecie}'::text[]))) OR ((cel = 1) AND
((hash)::text = ANY
('{dyskusja,kochanie,obserwatorium,szturchniecie}'::text[]))) OR
((hash)::text = 'administracja-info'::text) OR (alternatives: SubPlan
3 or hashed SubPlan 4))
SubPlan 1
-> Materialize (cost=13.28..11947.85 rows=1264420
width=4) (actual time=0.000..0.024 rows=485 loops=2137)
-> Nested Loop (cost=13.28..685.75
rows=1264420 width=4) (actual time=0.119..0.664 rows=485 loops=1)
-> HashAggregate (cost=5.89..5.90
rows=1 width=4) (actual time=0.015..0.017 rows=7 loops=1)
-> Index Scan using
spoleczniak_subskrypcje_postac_id on spoleczniak_subskrypcje
(cost=0.00..5.89 rows=2 width=4) (actual time=0.005..0.009 rows=7
loops=1)
Index Cond: (postac_id = 1)
-> Bitmap Heap Scan on
spoleczniak_oznaczone (cost=7.38..674.96 rows=391 width=8) (actual
time=0.019..0.082 rows=69 loops=7)
Recheck Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
-> Bitmap Index Scan on
spoleczniak_oznaczone_etykieta_id (cost=0.00..7.29 rows=391 width=0)
(actual time=0.013..0.013 rows=69 loops=7)
Index Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
SubPlan 2
-> Index Scan using
spoleczniak_obserwatorium_obserwujacy_id on spoleczniak_obserwatorium
(cost=0.00..39.36 rows=21 width=4) (actual time=0.006..0.030 rows=26
loops=1)
Index Cond: (obserwujacy_id = 1)
SubPlan 3
-> Bitmap Heap Scan on spoleczniak_komentarze
(cost=18.67..20.68 rows=1 width=0) (never executed)
Recheck Cond: ((kredka_id =
spoleczniak_tablica.id) AND (postac_id = 1))
-> BitmapAnd (cost=18.67..18.67 rows=1
width=0) (never executed)
-> Bitmap Index Scan on
spoleczniak_komentarze_kredka_id (cost=0.00..2.98 rows=24 width=0)
(never executed)
Index Cond: (kredka_id =
spoleczniak_tablica.id)
-> Bitmap Index Scan on
spoleczniak_komentarze_postac_id (cost=0.00..15.44 rows=890 width=0)
(never executed)
Index Cond: (postac_id = 1)
SubPlan 4
-> Index Scan using spoleczniak_komentarze_postac_id
on spoleczniak_komentarze (cost=0.00..1610.46 rows=890 width=4)
(actual time=0.013..2.983 rows=3605 loops=1)
Index Cond: (postac_id = 1)
-> Index Scan using postac_postacie_pkey on postac_postacie
(cost=0.00..0.31 rows=1 width=89) (actual time=0.004..0.005 rows=1
loops=21)
Index Cond: (id = spoleczniak_tablica.postac_id)
Total runtime: 149.211 ms (in rush hours runtime is ~600 ms)

If I delete ORDER BY clause, runtime is less than 30 ms. As you can
see - it's big table, more than 3 000 000 records. Any hints how to
optimize this query?

(I've sent that message without joining mailing first, If i'll do a
double post, please forgive me)

Browse pgsql-performance by date

  From Date Subject
Next Message Venkat Balaji 2011-08-30 05:55:47 Re: Re: How to track number of connections and hosts to Postgres cluster
Previous Message Szymon Kosok 2011-08-30 05:36:20 Query optimization help