Re: Reason of Slowness of query

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Reason of Slowness of query
Date: 2011-03-23 07:20:29
Message-ID: AANLkTinN8O2bhzVdoB6SMG33p6dzq+NA7-QeCoGzm3tB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Mar 22, 2011 at 11:28 PM, Adarsh Sharma <adarsh(dot)sharma(at)orkash(dot)com>wrote:

> *
> *I perform a join query on it as :
>
> * explain analyze select distinct(p.crawled_page_id) from page_content p ,
> clause2 c where p.crawled_page_id != c.source_id ;
>
> *What it takes more than 1 hour to complete. As I issue the explain
> analyze command and cannot able to wait for output but I send my explain
> output as :
>

please describe what your query is trying to select, as it is possible that
query isn't doing what you think it is. joining 2 tables where id1 != id2
will create a cross multiple of the two tables such that every row from the
first table is matched with every single row from the second table that
doesn't have a matching id. Then you are looking for distinct values on
that potentially enormous set of rows.

db_v2=# select * from table1;
id | value
----+-------
1 | 1
2 | 2
3 | 3
(3 rows)

db_v2=# select * from table2;
id | value
----+-------
1 | 4
2 | 5
3 | 6
(3 rows)

db_v2=# select t1.id, t1.value, t2.id, t2.value from table1 t1, table2 t2
where t1.id != t2.id;
id | value | id | value
----+-------+----+-------
1 | 1 | 2 | 5
1 | 1 | 3 | 6
2 | 2 | 1 | 4
2 | 2 | 3 | 6
3 | 3 | 1 | 4
3 | 3 | 2 | 5

So if you have a couple of million rows in each table, you are selecting
distinct over a potentially huge set of data. If you are actually trying
to find all ids from one table which have no match at all in the other
table, then you need an entirely different query:

db_v2=# insert into table2 (value) values (7);
INSERT 0 1

db_v2=# select * from table2;
id | value
----+-------
1 | 4
2 | 5
3 | 6
4 | 7

db_v2=# select t2.id, t2.value from table2 t2 where not exists (select 1
from table1 t1 where t1.id = t2.id);
id | value
----+-------
4 | 7

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vitalii Tymchyshyn 2011-03-23 07:25:34 Re: Reason of Slowness of query
Previous Message Chetan Suttraway 2011-03-23 07:16:55 Re: Reason of Slowness of query