Re: Sequencial scan in a JOIN

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Jaimes <andrewjaimes(at)hotmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sequencial scan in a JOIN
Date: 2012-07-18 20:00:53
Message-ID: CA+TgmobtviKnxJqEM9v4rSPOiNxWE8NCEPcuufm+-VdpGr8A0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jun 5, 2012 at 8:48 AM, Andrew Jaimes <andrewjaimes(at)hotmail(dot)com> wrote:
> Hi everyone,
>
> I am trying to run the following query:
>
> SELECT count(1) --DISTINCT l_userqueue.queueid
> FROM e_usersessions
> JOIN l_userqueue
> ON l_userqueue.userid = e_usersessions.entityid
> JOIN a_activity
> ON a_activity.activequeueid = l_userqueue.queueid
> AND a_activity.vstatus = 1
> AND a_activity.ventrydate > 0
> AND a_activity.sbuid = e_usersessions.sbuid
> AND a_activity.assignedtoid = 0
> AND a_activity.status <> '0'
> WHERE e_usersessions.sessionkeepalivedatetime > 20120605082131943
>
> Explain analyze:
> 'Aggregate (cost=100402.10..100402.11 rows=1 width=0) (actual
> time=2249.051..2249.051 rows=1 loops=1)'
> ' -> Hash Join (cost=10.93..99795.09 rows=242803 width=0) (actual
> time=0.541..2249.027 rows=33 loops=1)'
> ' Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND
> (a_activity.sbuid = e_usersessions.sbuid))'
> ' -> Seq Scan on a_activity (cost=0.00..88462.52 rows=1208167
> width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
> ' Filter: ((ventrydate > 0) AND ((status)::text <> '0'::text)
> AND (vstatus = 1) AND (assignedtoid = 0::numeric))'
> ' -> Hash (cost=10.86..10.86 rows=5 width=22) (actual
> time=0.053..0.053 rows=4 loops=1)'
> ' -> Hash Join (cost=9.38..10.86 rows=5 width=22) (actual
> time=0.033..0.048 rows=4 loops=1)'
> ' Hash Cond: (l_userqueue.userid =
> e_usersessions.entityid)'
> ' -> Seq Scan on l_userqueue (cost=0.00..1.23 rows=23
> width=27) (actual time=0.003..0.009 rows=23 loops=1)'
> ' -> Hash (cost=9.31..9.31 rows=5 width=21) (actual
> time=0.018..0.018 rows=2 loops=1)'
> ' -> Index Scan using i06_e_usersessions on
> e_usersessions (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012
> rows=2 loops=1)'
> ' Index Cond: (sessionkeepalivedatetime >
> 20120605082131943::bigint)'
> 'Total runtime: 2249.146 ms'
>
> I am trying to understand the reason why the a sequencial scan is used on
> a_activity instead of using the index by activequeueid (i08_a_activity).

I'm chiming in a bit late here, but it seems like you're hoping that
the query plan will form the outer join as a nested loop, with the
inner and outer sides swapped, so that the results of the join between
l_userqueue and e_usersessions are used to drive a series of index
scans on a_activity that avoid scanning the whole table. PostgreSQL
9.2 will be the first release that has the ability to generate that
kind of plan, so it would be interesting to see what happens if you
try this on 9.2beta.

Older releases should be able consider a nested loop join with
l_userqueue as the inner rel, driving an index scan over a_activity,
and then performing the join to e_usersessions afterwards. But that
plan might not be nearly as good, since then we'd have to do 23
index-scans on a_activity rather than just 4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nick Hofstede 2012-07-18 20:28:52 Re: optimizing queries using IN and EXISTS
Previous Message Robert Haas 2012-07-18 19:32:46 Re: Array fundamentals