Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables

From: Mark Hampton <mark(at)cleverdba(dot)com>
To:
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables
Date: 2013-04-30 20:03:21
Message-ID: CALqOgZ0WM_4=XfW0gOicdO48uOC4k9vnTXMZxwUMOTNezywpzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It's an interesting idea, however when I rewrote the original query to use
"WHERE EXISTS" rather than "WHERE IN", I get the same bad execution plan.
I think this really has to do with the Postgres optimizer's limitations
with respect to outer joins.

In my case it's certainly possible to rewrite the query by hand to
eliminate the outer join and get the same results.

And after posting the original problem, I have also found that with some
work it's possible to make Hibernate generate a query that eliminates the
outer join and get the same results.

But I think improving the Postgres optimizer to handle such cases would be
a nice improvement. Then again, having lived through many years of Oracle
optimizer bugs, it might be easier said than done.

On Tue, Apr 30, 2013 at 3:24 PM, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>wrote:

> What I can say is that hibernate has "exists" in both HQL and criteria API
> (e.g. see
> http://www.cereslogic.com/pages/2008/09/22/hibernate-criteria-subqueries-exists/ for
> criteria). So, may be it's easier for you to tune your hibernate query to
> use exists
>
>
> 2013/4/30 Mark Hampton <mark(at)cleverdba(dot)com>
>
>> I have a Hibernate-generated query (That's not going to change, so let's
>> just focus on the Postgres side for now) like this:
>>
>> SELECT *
>> from PERSON p
>> where p.PERSON_ID in (
>> select distinct p2.PERSON_ID
>> from PERSON p2
>> left outer join PERSON_ALIAS pa on
>> p2.PERSON_ID = pa.PERSON_ID
>> where (lower(p1.SURNAME) = 'duck' or
>> lower(pa.SURNAME) = 'duck') and
>> (lower(p1.FORENAME) = 'donald' or
>> lower(pa.FORENAME) = 'donald')
>> )
>> order by p.PERSON_ID asc;
>>
>> There are function-based indexes on PERSON and PERSON_ALIAS as follows:
>>
>> CREATE INDEX PERSON_FORENAME_LOWER_FBIDX ON PERSON (LOWER(FORENAME)
>> VARCHAR_PATTERN_OPS);
>> CREATE INDEX PERSON_SURNAME_LOWER_FBIDX ON PERSON (LOWER(SURNAME) VARCHAR
>> _PATTERN_OPS);
>> CREATE INDEX PERSON_ALIAS_FORENAME_LOWER_FBIDX ON PERSON_ALIAS
>> (LOWER(FORENAME) VARCHAR_PATTERN_OPS);
>> CREATE INDEX PERSON_ALIAS_SURNAME_LOWER_FBIDX ON PERSON_ALIAS
>> (LOWER(SURNAME) VARCHAR_PATTERN_OPS);
>>
>> The problem is that the above query doesn't use the indexes. The "or"
>> clauses across the outer-join seem to be the culprit. If I rewrite the
>> query as follows, Postgres will use the index:
>>
>> SELECT *
>> from PERSON p
>> where (p.PERSON_ID in (
>> select p2.PERSON_ID
>> from TRAVELER.PERSON p2
>> join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
>> pa.PERSON_ID
>> where lower(p2.SURNAME) = 'duck' and
>> lower(pa.FORENAME) = 'donald'
>> ) or
>> p.PERSON_ID in (
>> select p2.PERSON_ID
>> from TRAVELER.PERSON p2
>> join TRAVELER.OTHER_NAME pa on p2.PERSON_ID =
>> pa.PERSON_ID
>> where lower(pa.SURNAME) = 'duck' and
>> lower(p2.FORENAME) = 'donald'
>> ) or
>> p.PERSON_ID in (
>> select p2.PERSON_ID
>> from TRAVELER.PERSON p2
>> where lower(p2.SURNAME) = 'duck' and
>> lower(p2.FORENAME) = 'donald'
>> ) or
>> p.PERSON_ID in (
>> select p2.PERSON_ID
>> from TRAVELER.OTHER_NAME pa
>> where lower(pa.SURNAME) = 'duck' and
>> lower(pa.FORENAME) = 'donald'
>> ))
>> order by p.PERSON_ID asc;
>>
>> So my question is this: Is there a way to get the Postgres optimizer
>> "rewrite" the query execution plan to use the equivalent, but much more
>> efficient latter form?
>>
>> And before you ask; yes, there are better ways of writing this query.
>> But we're dealing with Java developers and Hibernate here. It's a legacy
>> system, and the policy is to avoid hand-written SQL, so for the moment
>> let's not go down that rabbit hole, and focus on the issue of what the
>> optimizer can and cannot do.
>>
>
>
>
> --
> Best regards,
> Vitalii Tymchyshyn
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Anne Rosset 2013-05-01 05:50:59 Deterioration in performance when query executed in multi threads
Previous Message Vitalii Tymchyshyn 2013-04-30 19:24:43 Re: Bad Execution Plan with "OR" Clauses Across Outer-Joined Tables