A Better Way? (Multi-Left Join Lookup)

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: A Better Way? (Multi-Left Join Lookup)
Date: 2012-07-20 20:30:29
Message-ID: 00c701cd66b6$815191a0$83f4b4e0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi!

Can someone please point me to a resource (or suggest a solution) that will
improve the performance of this query? I have some thoughts but figure I
should avoid reinventing the wheel since this seems like something that has
to have been solved already.

I am working on a query where I have a list of identifiers (sample set has
about 8,500 records) and I have three other queries that return a subset of
these 8,500 identifiers

Basic query is designed as such:

WITH

full_set AS ( ) -- 8,500 records

, sub_1 AS () -- also about 8,500

, sub_2 AS () -- maybe 5,000

, sub_3 AS () - - maybe 3,000

SELECT full_set.*

, COALESCE(sub_1.field, FALSE)

, COALESCE(sub_2.field, FALSE)

, COALESCE(sub_2.field, FALSE)

FROM full_set

LEFT JOIN sub_1

LEFT JOIN sub_2

LEFT JOIN sub_3

The goal is to output a boolean for each record in "full_set" specifying
whether a corresponding records exists in the sub-set. If the record exists
"sub_x.field" is defined to be TRUE and thus is output otherwise sub_x.field
is NULL and coalesce returns FALSE.

I have some explain+analyze results for this and an equivalent query that
uses sub-queries in FROM instead of CTEs but I figure I'll throw this out on
general as it seems fairly basic. I am guessing that anything more deep
should be sent to the performance sub-list (which I haven't subscribed to at
this point) with the explain+analyze results attached.

The performance of this query is exponential due to the fact that the
sub-queries/CTEs are not indexed and so each subset has to be scanned
completely for each record in the full set.

The identifiers are sortable and so my first thought was to try and divide
each dataset into sub-sets of, say, 1000, then UNION the results of the 9
passes it would take to process all 8,500 records. Would a Recursive CTE be
able to accomplish this? The sticking point is that the identifiers not
pure integers (usually, in my current dataset they are) and even when the
range of values is variable.

I am working with 9.0

Thank You!

David J.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-07-20 20:46:34 Re: A Better Way? (Multi-Left Join Lookup)
Previous Message Manoj Govindassamy 2012-07-20 19:28:33 Pg_ctl promote -- wait for slave to be promoted fully ?