Re: Performance issues

From: Vivekanand Joshi <vjoshi(at)zetainteractive(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Varadharajan Mukundan <srinathsmn(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance issues
Date: 2015-03-17 07:41:27
Message-ID: c5231342781287ff8cbc9ac1f20ff036@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Guys,

Next level of query is following:

If this works, I guess 90% of the problem will be solved.

SELECT
COUNT(DISTINCT TARGET_ID)
FROM
S_V_F_PROMOTION_HISTORY_EMAIL PH
INNER JOIN S_V_D_CAMPAIGN_HIERARCHY CH
ON PH.TOUCHPOINT_EXECUTION_ID =
CH.TOUCHPOINT_EXECUTION_ID
WHERE
1=1
AND SEND_DT >= '2014-03-13'
AND SEND_DT <= '2015-03-14'

In this query, I am joining two views which were made earlier with CTEs. I
have replaced the CTE's with subqueries. The view were giving me output in
around 5-10 minutes and now I am getting the same result in around 3-4
seconds.

But when I executed the query written above, I am again stuck. I am
attaching the query plan as well the link.

http://explain.depesz.com/s/REeu

I can see most of the time is spending inside a nested loop and total
costs comes out be cost=338203.81..338203.82.

How to take care of this? I need to run this query in a report so I cannot
create a table like select * from views and then join the table. If I do
that I am getting the answer of whole big query in some 6-7 seconds. But
that is not feasible. A report (Jasper can have only one single (big/small
query).

Let me know if you need any other information.

Thanks a ton!
Vivek

-----Original Message-----
From: Jim Nasby [mailto:Jim(dot)Nasby(at)BlueTreble(dot)com]
Sent: Tuesday, March 17, 2015 5:36 AM
To: Tomas Vondra; vjoshi(at)zetainteractive(dot)com; Scott Marlowe; Varadharajan
Mukundan
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Performance issues

On 3/16/15 3:59 PM, Tomas Vondra wrote:
> On 16.3.2015 20:43, Jim Nasby wrote:
>> On 3/13/15 7:12 PM, Tomas Vondra wrote:
>>> (4) I suspect many of the relations referenced in the views are not
>>> actually needed in the query, i.e. the join is performed but
>>> then it's just discarded because those columns are not used.
>>> Try to simplify the views as much has possible - remove all the
>>> tables that are not really necessary to run the query. If two
>>> queries need different tables, maybe defining two views is
>>> a better approach.
>>
>> A better alternative with multi-purpose views is to use an outer join
>> instead of an inner join. With an outer join if you ultimately don't
>> refer to any of the columns in a particular table Postgres will
>> remove the table from the query completely.
>
> Really? Because a quick test suggests otherwise:
>
> db=# create table test_a (id int);
> CREATE TABLE
> db=# create table test_b (id int);
> CREATE TABLE
> db=# explain select test_a.* from test_a left join test_b using (id);
> QUERY PLAN
> ----------------------------------------------------------------------
> Merge Left Join (cost=359.57..860.00 rows=32512 width=4)
> Merge Cond: (test_a.id = test_b.id)
> -> Sort (cost=179.78..186.16 rows=2550 width=4)
> Sort Key: test_a.id
> -> Seq Scan on test_a (cost=0.00..35.50 rows=2550 width=4)
> -> Sort (cost=179.78..186.16 rows=2550 width=4)
> Sort Key: test_b.id
> -> Seq Scan on test_b (cost=0.00..35.50 rows=2550 width=4)
> (8 rows)
>
> Also, how would that work with duplicate rows in the referenced table?

Right, I neglected to mention that the omitted table must also be unique
on the join key:

decibel(at)decina(dot)attlocal=# create table a(a_id serial primary key); CREATE
TABLE decibel(at)decina(dot)attlocal=# create table b(a_id int); CREATE TABLE
decibel(at)decina(dot)attlocal=# explain analyze select a.* from a left join b
using(a_id);
QUERY PLAN

--------------------------------------------------------------------------
---------------------------------
Hash Right Join (cost=67.38..137.94 rows=2550 width=4) (actual
time=0.035..0.035 rows=0 loops=1)
Hash Cond: (b.a_id = a.a_id)
-> Seq Scan on b (cost=0.00..35.50 rows=2550 width=4) (never
executed)
-> Hash (cost=35.50..35.50 rows=2550 width=4) (actual
time=0.002..0.002 rows=0 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 32kB
-> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
Planning time: 0.380 ms
Execution time: 0.086 ms
(8 rows)

decibel(at)decina(dot)attlocal=# alter table b add primary key(a_id); ALTER TABLE
decibel(at)decina(dot)attlocal=# explain analyze select a.* from a left join b
using(a_id);
QUERY PLAN

--------------------------------------------------------------------------
---------------------
Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
Planning time: 0.247 ms
Execution time: 0.029 ms
(3 rows)

decibel(at)decina(dot)attlocal=# alter table a drop constraint a_pkey; ALTER
TABLE decibel(at)decina(dot)attlocal=# explain analyze select a.* from a left
join b using(a_id);
QUERY PLAN

--------------------------------------------------------------------------
---------------------
Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
Planning time: 0.098 ms
Execution time: 0.011 ms
(3 rows)
--
Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it
in Treble! http://BlueTreble.com

Attachment Content-Type Size
Explain.sql text/plain 7.4 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sergey Shchukin 2015-03-17 10:22:27 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Jim Nasby 2015-03-17 00:06:19 Re: Performance issues