Re: A Better Way? (Multi-Left Join Lookup)

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: A Better Way? (Multi-Left Join Lookup)
Date: 2012-07-20 21:37:12
Message-ID: 00f201cd66bf$d3273080$79759180$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Friday, July 20, 2012 4:47 PM
> To: David Johnston
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> "David Johnston" <polobo(at)yahoo(dot)com> writes:
> > WITH
> > full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
>
> > The performance of this query is exponential due to the fact that the
> > sub-queries/CTEs are not indexed and so each subset has to be scanned
> > completely for each record in the full set.
>
> Surely not. Neither merge nor hash joins require an index. What plan is
> getting selected? Are you sure there's at most one match in each "sub"
set
> for each row in the "full" set? If you were getting a large number of
matches
> in some cases, the size of the result could balloon to something
unfortunate
> ... but we have not got enough information to know.
>
> regards, tom lane

The final result, in this case would have 8,500 records AND
sub_1.field would be TRUE for basically all of them and FALSE for the
minimal remainder
sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them
sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them

There is never, in reality, two records in a sub-table for a single record
in the master table. It is possible a record exists in a sub-table but not
in the main table but I do not care about those (thus the LEFT instead of a
FULL OUTER JOIN).

I have attached a scrubbed query and explain/analyze. Let me know if
something more is needed.

I have included two versions of the query, one using CTE and the other using
mostly sub-selects.

I had run ANALYZE on the pertinent tables but the CTE queries all perform
quite quickly when run by themselves.

In looking at the source tables for the data I did notice that I have not
properly defined the relevant INDEXes as being UNIQUE. This applies to two
of the sub-tables. The third sub-table requires the use of "DISTINCT". The
joining columns with each set of data are unique when fed into the LEFT
JOIN. The master CTE/Query is generated via a function call and it also
generates unique keys for the LEFT JOIN.

Thank you for your help!

David J.

Attachment Content-Type Size
query_and_explain.txt text/plain 8.4 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2012-07-20 21:49:16 Re: A Better Way? (Multi-Left Join Lookup)
Previous Message Alban Hertroys 2012-07-20 21:02:52 Re: A Better Way? (Multi-Left Join Lookup)