Skip site navigation (1) Skip section navigation (2)

Question about subselect/IN performance

From: "T(dot)H(dot)" <calinet6(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Question about subselect/IN performance
Date: 2010-11-30 17:43:24
Message-ID: id3d3s$c8s$1@news.hub.org (view raw or flat)
Thread:
Lists: pgsql-performance
I have a query that's running an IN/Subselect that joins three different 
tables and gets a list of IDs to compare against... the subselect 
basically looks for records through a join table based on the 3rd 
table's name, similar to:

... IN (SELECT id FROM foo, foo_bar, bar
         WHERE foo.id = foo_bar.foo_id
             AND bar.id = foo_bar.bar_id
             AND bar.name = "something") ...

This is all nested in a fairly complex query, and several of these 
subselects operate on different tables within the query. The whole 
thing, on some high-cardinality cases, can take 2.5 seconds to run 
(clearly something can be done about that).

So in this example, the cardinality of the bar table is very low, and 
fairly constant, something on the order of 5-7 records. In an 
optimization attempt, I reduced the joins in the subselect from 2 to 1 
by passing in the ID of the bar with the correct name, which I can 
easily cache application-side or pre-fetch in a single query. Now it 
looks like this:

... IN (SELECT id FROM foo, foo_bar
          WHERE foo.id = foo_bar.foo_id
              AND foo_bar.bar_id = 1) ...

Crazy thing is, that single optimization reduced the query time 
significantly, from 2.5-3 seconds down to 40-60ms.

Does anyone have any kind of explanation for this? Are the inner 
workings of the IN clause taking the plan for the subselect into account 
when running, and doing something clever with it? Any insight on the 
internal mechanisms of IN or subselects in Postgres would be greatly 
appreciated if anyone knows more.

Also, are there any better ways you can think of doing such an IN query, 
using non-subselect means that might be more efficient?

Thanks in advance, any advice/help understanding this better is greatly 
appreciated.

Responses

pgsql-performance by date

Next:From: Pierre CDate: 2010-11-30 22:35:04
Subject: Re: Simple database, multiple instances?
Previous:From: Maciek SakrejdaDate: 2010-11-30 17:16:26
Subject: Re: Simple database, multiple instances?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group