Re: BUG #14107: Major query planner bug regarding subqueries and indices

From: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14107: Major query planner bug regarding subqueries and indices
Date: 2016-04-29 12:15:38
Message-ID: e2090738-62b9-7697-886a-b2a01a8c9482@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> If you know that the
> sub-select isn't going to return very many rows, you could do
>
> SELECT ... FROM a WHERE a.x = ? OR a.y = ANY(ARRAY(SELECT ...));

Isn't the planner already doing something like this, since the following
query is using the index as expected:

SELECT ... FROM a WHERE a.x IN (SELECT ...);

> but this would blow up rather badly with a large sub-select result,
> so I'm not sure I want to try to make the planner transform it that
> way automatically.

Wouldn't it be possible then to use this optimization based on the
estimated result size of the subquery? I think this would almost always
be faster than a sequential scan anyway. I observed that using the
ANY(ARRAY(SELECT...)) syntax on my small test tables (100 K rows)
already improves the query time by a factor of more than 100, and it
will be even more when tables are large. Please consider implementing
this optimization!

> I don't actually see any way to do very much with your second example at
> all:
>
>> SELECT ... FROM a JOIN b ON (...) WHERE a.x = ? OR b.y = ?;

Assuming both joined tables contain a PK (or another unique column),
then it should be possible by replanning the query as:

SELECT ... FROM a JOIN b ON ((join_cond AND a.x = ?) OR (join_cond AND
b.y = ?));

(Let "join_cond" denote the original join condition here.) Now, the JOIN
implementation must be smart enough to handle OR conditions: First,
obtain the rows satisfying

join_cond AND a.x = ?

by using the index as usual. For each matching row, create the tuple
(a.id, b.id) and insert it into a search tree (or hash or whatever).
Then, obtain the rows satisfying

join_cond AND b.y = ?

For each matching row, query the search tree whether it already contains
the tuple (a.id, b.id), and only add the current row to the final result
if it doesn't.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message John McKown 2016-04-29 14:47:29 Re: BUG #14121: Constraint UNIQUE
Previous Message Michael Paquier 2016-04-29 12:08:55 Re: BUG #14109: pg_rewind fails to update target control file in one scenario