Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause

From: "Sven R(dot) Kunze" <srkunze(at)mail(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Date: 2016-09-29 18:48:01
Message-ID: a0cde040-44f8-ac0f-c87a-932d2901702e@mail.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Jeff,

On 29.09.2016 20:03, Jeff Janes wrote:
> I don't know what the subquery plan is, I don't see references to that
> in the email chain.

Lutz posted the following solution:

SELECT * FROM big_table
WHERE
id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"
IN (<handful of items>))
OR
id in (SELECT big_table_id FROM table_a WHERE "table_b"."item_id"
IN (<handful of items>))

> I don't believe that current versions of PostgreSQL are capable of
> rewriting the plan in the style of a union. It is not just a matter
> of tweaking the cost estimates, it simply never considers such a plan
> in the first place given the text of your query.

That's okay and that's why I am asking here. :)

> Perhaps some future version of PostgreSQL could do so, but my gut
> feeling is that that is not very likely. It would take a lot of work,
> would risk breaking or slowing down other things, and is probably too
> much of a niche issue to attract a lot of interest.

I don't hope so; in business and reports/stats applications there is a
lot of room for this.

Why do you think that OR-ing several tables is a niche issue? I can at
least name 3 different projects (from 3 different domains) where
combining 3 or more tables with OR is relevant and should be reasonably
fast.

Most domains that could benefit would probably have star-like schemas.
So, big_table corresponds to the center of the star, whereas the rays
correspond to various (even dynamic) extensions to the base data structure.

> Why not just use the union?

Sure that would work in this particular case. However, this thread
actually sought a general answer to "how to OR more than two tables".

> Are you using a framework which generates the query automatically and
> you have no control over it?

We use a framework and we can use the UNION if we want to.

> Or do you just think it is ugly or fragile for some other reason?

I don't think it's ugly or fragile. I am just used to the fact that **if
it's equivalent** then PostgreSQL can figure it out (without constant
supervision from application developers).

So, it's just a matter of inconvenience. ;)

> Perhaps moving the union from the outside to the inside would be more
> suitable? That way teh select list is only specified once, and if you
> AND more clauses into the WHERE condition they also only need to be
> specified once.
>
> SELECT * FROM big_table
> WHERE
> id in (SELECT big_table_id FROM table_a WHERE "table_a"."item_id"
> IN (<handful of items>) union
> SELECT big_table_id FROM table_a WHERE "table_b"."item_id" IN
> (<handful of items>)
> );

Yet another solution I guess, so thanks a lot. :)

This multitude of solution also shows that applications developers might
be overwhelmed by choosing the most appropriate AND most long-lasting
one. Because what I take from the discussion is that a UNION might be
appropriate right now but that could change in the future even for the
very same use-case at hand.

Cheers,
Sven

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sven R. Kunze 2016-09-29 18:49:32 Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause
Previous Message Pavel Stehule 2016-09-29 18:12:58 Re: Multiple-Table-Spanning Joins with ORs in WHERE Clause