Re: A Better Way? (Multi-Left Join Lookup)

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Alban Hertroys'" <haramrae(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: A Better Way? (Multi-Left Join Lookup)
Date: 2012-07-20 21:49:16
Message-ID: 00fc01cd66c1$82ad9e30$8808da90$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: Alban Hertroys [mailto:haramrae(at)gmail(dot)com]
> Sent: Friday, July 20, 2012 5:03 PM
> To: David Johnston
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> On 20 Jul 2012, at 22:30, David Johnston wrote:
>
> > Hi!
> >
> > Can someone please point me to a resource (or suggest a solution) that
will
> improve the performance of this query? I have some thoughts but figure I
> should avoid reinventing the wheel since this seems like something that
has
> to have been solved already.
> >
> > I am working on a query where I have a list of identifiers (sample set
has
> about 8,500 records) and I have three other queries that return a subset
of
> these 8,500 identifiers
> >
> > Basic query is designed as such:
> >
> > WITH
> > full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> >
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
> >
> > The goal is to output a boolean for each record in "full_set" specifying
> whether a corresponding records exists in the sub-set. If the record
exists
> "sub_x.field" is defined to be TRUE and thus is output otherwise
sub_x.field
> is NULL and coalesce returns FALSE.
>
> You are creating a product of the result sets for sub_1 to _3 there, while
you
> only seem to need the union of the three.
>
> Perhaps something like this is what you're after?
>
> WITH
> full_set AS ( )
> , subs AS (
> SELECT 1 AS sub, TRUE AS field, ... FROM sub_1
> UNION ALL
> SELECT 2 AS sub, TRUE AS field, ... FROM sub_2
> UNION ALL
> SELECT 3 AS sub, TRUE AS field, ... FROM sub_3
> )
> SELECT ...
> FROM full_set
> LEFT JOIN subs
>
> If you need those rows to be distinct, use UNION instead of UNION ALL, but
> the database needs to do more work for that.
>
>
> Alban Hertroys
>

Using "UNION" I increase the number of output rows such that an identifier
that has a matching record in all three subsets will appear 3-times in the
result. Now, I can run this through a GROUP BY and use CASE statements to
get it back into the multi-column format required but that seems messy.
Also, there should not be a "product" between the sub-queries but only
between an individual sub-query and the main query. The fact there are 3
sub-queries should result in additive resource consumption (al. la. UNION):
[ M x (A + B + C) == MA + MB + MC ]. The left side is the UNION suggestion
while the right-side is the current multi-left-join suggestion. Data wise
they are equivalent but the left-side uses additional rows while the
right-side uses additional columns.

That said I will play with it just to see if the pre-UNION and a post-GROUP
performs better than the multi-left-join that seems to be the most direct
solution.

Thank You!

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message AI Rumman 2012-07-20 22:26:50 Re: I cannot insert bengali character in UTF8
Previous Message David Johnston 2012-07-20 21:37:12 Re: A Better Way? (Multi-Left Join Lookup)