Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: XINYULIU(at)umich(dot)edu, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification
Date: 2020-09-21 01:49:26
Message-ID: CAApHDvp=wzAOq5e6NXeXRuBfnugsXk=NTKiGuWhKb0xf1hiObA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, 19 Sep 2020 at 09:22, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > We are developing a tool for automatically finding performance bugs in
> > PostgreSQL. Our key insight is that given a pair of semantic equivalent
> > queries, a robust DBMS should return the same result within a similar
> > execution time. Significant time difference suggests a potential performance
> > bug in the DBMS.
>
> Unfortunately, that's a viewpoint that is not going to win you a lot of
> converts.
>
> > First query:
> > SELECT "ps_suppkey"
> > FROM "partsupp"
> > WHERE "ps_partkey" = 1486;
>
> > Second query:
> > SELECT "ps_suppkey"
> > FROM "partsupp"
> > WHERE "ps_partkey" + 1486 = 2972;
>
> Sure, in theory the optimizer could rearrange that to
> "WHERE ps_partkey = 2972 - 1486" and thus still have an indexable
> condition. In practice, we have essentially no interest in doing so.
> It would require vastly more semantic knowledge than the optimizer
> has got, and looking for such cases would consume lots of planner
> cycles that would be better spent elsewhere.
>
> If you want to complain that that makes Postgres' optimizer "not
> robust", you're entitled to that opinion. But it's very unlikely
> to change in the foreseeable future. We expect that if a user
> cares about the performance of a particular query, they'll be
> willing to write it in a form that the optimizer can deal with.

I'd go a bit further and say that the queries are not even equivalent.

The reason for that is that "ps_partkey" + 1486 = 2972 could cause an
arithmetic overflow ERROR whereas "ps_partkey" = 1486 couldn't.
Perhaps nobody would complain if we didn't throw an ERROR where we
otherwise might have. However, when you consider that you must do 2972
- 1486 to obtain the new constant to compare to the index column, you
could start getting the overflows there instead. In that case, you
could cause overflows in cases that would have worked perfectly fine
without applying the optimisation. So it seems to me that the author
the query could have written it that way exactly to eliminate the
chances of such overflows. They might not be happy if we rewrite their
query to make overflows possible again.

David

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-09-21 21:20:22 Re: BUG #16622: pg_dump produces erroneus ALTER TABLE statement for a table with an inherited generated column
Previous Message David Rowley 2020-09-21 00:00:09 Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function