Re: Making CASE error handling less surprising

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Making CASE error handling less surprising
Date: 2020-07-26 17:27:15
Message-ID: CAN-RpxD_tWoNtzW28qHrwR1MSXbaoC0u0LJQP9apW5P9kiOy2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 24, 2020 at 7:18 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Like Pavel, and I think implicitly Dagfinn and Andres, I'm not sure I
> > believe this. Pavel's example is a good one. The leakproof exception
> > helps, but it doesn't cover everything. Users I've encountered throw
> > things like date_trunc() and lpad() into SQL code and expect them to
> > behave (from a performance point of view) like constants, but they
> > also expect 1/0 not to get evaluated too early when e.g. CASE is used.
> > It's difficult to meet both sets of expectations at the same time and
> > we're probably never going to have a perfect solution, but I think
> > you're minimizing the concern too much here.
>
> I've quoted this point before, but ... we can make queries arbitrarily
> fast, if we don't have to give the right answer. I think we've seen
> enough complaints on this topic now to make it clear that what we're
> doing today with CASE is the wrong answer.
>

So here's my concern in a little more detail.

For small databases, these performance concerns are not big deals. But for
large, heavily loaded databases one tends to run into all of the
pathological cases more frequently. In other words the overhead for the
largest users will likely not be proportional to the gains of the newer
users who are surprised by the current behavior. The more complex we make
exceptions as to how the planner works, the more complex the knowledge
required to work on the high end of the database is. So the complexity
here is such that I just don't think is worth it.

> The performance argument can be made to cut both ways, too. If somebody's
> got a very expensive function in a CASE arm that they don't expect to
> reach, having it be evaluated anyway because it's got constant inputs
> isn't going to make them happy.
>

However in this case we would be evaluating the expensive case arm every
time it is invoked (i.e. for every row matched), right? It is hard to see
this as even being close to a performance gain or even approximately
neutral because the cases where you have a significant gain are likely to
be extremely rare, and the penalties for when the cost applies will be many
multiples of the maximum gain.

>
> The real bottom line is: if you don't want to do this, how else do
> you want to fix the problem? I'm no longer willing to deny that
> there is a problem.
>

I see three ways forward.

The first (probably the best) would be a solution along the lines of yours
along with a session-level GUC variable which could determine whether case
branches can fold constants. This has several important benefits:

1. It gets a fix in shortly for those who want it.
2. It ensures this is optional behavior for the more experienced users
(where one can better decide which direction to go), and
3. It makes the behavior explicit, documented, and thus more easily
understood.

A third approach would be to allow some sort of "constant evaluation
mechanism" maybe with its own memory context where constants could be
cached on first evaluation under the statement memory context. That would
solve the problem more gneerally.

>
> > I don't think I believe this either. I don't think an average user is
> > going to expect <expression> to behave differently from (SELECT
> > <expression>).
>
> Agreed, that's poorly (or not at all?) documented. But it's been
> true all along, and this patch isn't changing that behavior at all.
> I'm not sure if we should do anything more than improve the docs,
> but in any case it seems independent of the CASE issue.
>
> > The current behavior isn't great, but at least it handles these
> > cases consistently.
>
> Really?
>
> regards, tom lane
>
>
>

--
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2020-07-26 17:54:35 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Dilip Kumar 2020-07-26 13:57:45 Re: Parallel bitmap index scan