Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause

From: "Sven R(dot) Kunze" <srkunze(at)mail(dot)de>
To: lfischer <l(dot)fischer(at)ed(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, jeff(dot)janes(at)gmail(dot)com, ineyman(at)perceptron(dot)com, b(dot)n(dot)madusudanan(at)gmail(dot)com
Subject: Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Date: 2016-09-23 06:35:09
Message-ID: 0e86e399-61dc-8e78-415e-c83db9592a80@mail.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks a lot Madusudanan, Igor, Lutz and Jeff for your suggestions.

What I can confirm is that the UNION ideas runs extremely fast (don't
have access to the db right now to test the subquery idea, but will
check next week as I travel right now). Thanks again! :)

I was wondering: would it be possible for PostgreSQL to rewrite the
query to generate the UNION (or subquery plan if it's also fast) on it's
own?

Thanks,
Sven

On 22.09.2016 16:44, lfischer wrote:
> Hi Sven
>
> Why not do something like
>
> SELECT * FROM big_table
> WHERE
> id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"
> IN (<handful of items>))
> OR
> id in (SELECT big_table_id FROM table_a WHERE "table_b"."item_id"
> IN (<handful of items>))
>
> that way you don't need the "distinct" and therefore there should be
> less comparison going on.
>
> Lutz
>
> On 22/09/16 14:24, Sven R. Kunze wrote:
>> Hi pgsql-performance list,
>>
>>
>> what is the recommended way of doing **multiple-table-spanning joins
>> with ORs in the WHERE-clause**?
>>
>>
>> Until now, we've used the LEFT OUTER JOIN to filter big_table like so:
>>
>>
>> SELECT DISTINCT <fields of big_table>
>> FROM
>> "big_table"
>> LEFT OUTER JOIN "table_a" ON ("big_table"."id" =
>> "table_a"."big_table_id")
>> LEFT OUTER JOIN "table_b" ON ("big_table"."id" =
>> "table_b"."big_table_id")
>> WHERE
>> "table_a"."item_id" IN (<handful of items>)
>> OR
>> "table_b"."item_id" IN (<handful of items>);
>>
>>
>> However, this results in an awful slow plan (requiring to scan the
>> complete big_table which obviously isn't optimal).
>> So, we decided (at least for now) to split up the query into two
>> separate ones and merge/de-duplicate the result with application logic:
>>
>>
>> SELECT <fields of big_table>
>> FROM
>> "big_table" INNER JOIN "table_a" ON ("big_table"."id" =
>> "table_a"."big_table_id")
>> WHERE
>> "table_a"."item_id" IN (<handful of items>);
>>
>>
>> SELECT <fields of big_table>
>> FROM
>> "big_table" INNER JOIN "table_b" ON ("big_table"."id" =
>> "table_b"."big_table_id")
>> WHERE
>> "table_b"."item_id" IN (<handful of items>);
>>
>>
>> As you can imagine we would be very glad to solve this issue with a
>> single query and without having to re-code existing logic of
>> PostgreSQL. But how?
>>
>>
>> Best,
>> Sven
>>
>>
>> PS: if you require EXPLAIN ANALYZE, I can post them as well.
>>
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dev Nop 2016-09-23 10:12:22 Storing large documents - one table or partition by doc?
Previous Message Tom Lane 2016-09-22 16:25:15 Re: query against single partition uses index, against master table does seq scan