Re: Performance Problem with sub-select using array

From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Travis Whitton" <tinymountain(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Performance Problem with sub-select using array
Date: 2006-08-29 14:34:35
Message-ID: bf05e51c0608290734w7795d260q160abc9b91e8a58e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 8/28/06, Travis Whitton <tinymountain(at)gmail(dot)com> wrote:
>
> I'm pretty sure you're right, which leads me to my next question. Is it
> possible to pass a column from an outer query to a subquery? For example, is
> there a way to do something like.
>
> SELECT owners.id AS owner_id, array(SELECT dogs.name WHERE owners.id =
> owner_id) ...
>
> I would just do a normal inner-join, but then I get a row for each item
> that would otherwise come back nicely packaged in the array. The overhead of
> rearranging the data takes even more time than the subquery approach.
>

I don't think you can do that but I may be wrong. I usually try to stay
away from correlated sub-queries because of performance concerns and query
complexity. I find simple subqueries with well formed inner/outer joins
work much better.

Does anyone know where documentation about the array function can be found?
I did a search but cannot find it on the postgresql web site.

On 8/28/06, Aaron Bono <postgresql(at)aranya(dot)com> wrote:
> >
> > On 8/24/06, Travis Whitton <tinymountain(at)gmail(dot)com > wrote:
> >
> > > Hello all, I'm running the following query on about 6,000 records
> > > worth of data, and it takes about 8 seconds to complete. Can anyone provide
> > > any suggestions to improve performance? I have an index on two columns in
> > > the transacts table (program_id, customer_id). If I specify a number for
> > > customer.id in the sub-select, query time is reduced to about 2
> > > seconds, which still seems like a million years for only 6,000 records, but
> > > I'm guessing that the sub-select can't resolve the id since it's done before
> > > the outer query, so it scans the entre recordset for every row? Transacts is
> > > a many to many table for customers and programs. I know this query doesn't
> > > even reference any columns from programs; however, I dynamically insert
> > > where clauses to constrain the result set.
> > >
> > > SELECT distinct customers.id, first_name, last_name, address1,
> > > contact_city, contact_state, primary_phone, email, array(select
> > > programs.program_name from transacts, programs where customer_id =
> > > customers.id and programs.id = transacts.program_id and submit_status
> > > = 'success') AS partners from customers, transacts, programs where
> > > transacts.customer_id = customers.id and transacts.program_id =
> > > programs.id
> > >
> >
> >
> > My guess is that your problem is that you may be getting 6000 rows, but
> > the array(select ....) is having to run once for each of record returned (so
> > it is running 6000 times).
> >
> > Try an explain analyze: http://www.postgresql.org/docs/7.4/interactive/sql-explain.html
> > - that will reveal more of where the performance problem is.
> >
>
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Sumeet 2006-08-29 16:44:28
Previous Message Aaron Bono 2006-08-29 14:28:42 Re: Performance Problem with sub-select using array