Re: Transforming IN (...) to ORs, volatility

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Marti Raudsepp" <marti(at)juffo(dot)org>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transforming IN (...) to ORs, volatility
Date: 2011-04-11 16:06:43
Message-ID: 4DA2E0C3020000250003C64A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 05.04.2011 18:42, Heikki Linnakangas wrote:
>> On 05.04.2011 13:19, Marti Raudsepp wrote:
>>> On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> We sometimes transform IN-clauses to a list of ORs:
>>>>
>>>> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>>>> QUERY PLAN
>>>> Seq Scan on foo (cost=0.00..39.10 rows=19 width=12)
>>>> Filter: ((a = b) OR (a = c))
>>>>
>>>> But what if you replace "a" with a volatile function? It
>>>> doesn't seem legal to do that transformation in that case, but
>>>> we do it:
>>>>
>>>> postgres=# explain SELECT * FROM foo WHERE
>>>> (random()*2)::integer IN (b, c);
>>>> QUERY PLAN
>>>>
>>>> Seq Scan on foo (cost=0.00..68.20 rows=19 width=12)
>>>> Filter: ((((random() * 2::double precision))::integer = b) OR
>>>> (((random()
>>>> * 2::double precision))::integer = c))
>>>
>>> Is there a similar problem with the BETWEEN clause
>>> transformation into AND expressions?
>>>
>>> marti=> explain verbose select random() between 0.25 and 0.75;
>>> Result (cost=0.00..0.02 rows=1 width=0)
>>> Output: ((random()>= 0.25::double precision) AND (random()<=
>>> 0.75::double precision))
>>
>> Yes, good point.
>
> Hmm, the SQL specification explicitly says that
>
> X BETWEEN Y AND Z
>
> is equal to
>
> X >= Y AND X <= Z
>
> It doesn't say anything about side-effects of X. Seems like an
> oversight in the specification. I would not expect X to be
> evaluated twice, and I think we should change BETWEEN to not do
> that.

Does the SQL spec explicitly say anything about how many times X
should be evaluated if you were to code it as?:

X >= Y AND X <= Z

If it does, evaluating it a different number of times for BETWEEN
would seem to be a deviation from standard. Evaluating it once seem
less surprising, but if we're going to deviate from the standard in
doing that, it at least deserves a clear note to that effect in the
docs.

Evaluating X once for BETWEEN seems better from a POLA perspective,
unless you happen to be massaging a query to another form and
trusting that the equivalence defined in the standard will always
hold.

> Does anyone object to making BETWEEN and IN more strict about the
> data types? At the moment, you can do this:
>
> postgres=# SELECT '1234' BETWEEN '10001'::text AND 10002::int4;
> ?column?
> ----------
> t
> (1 row)
>
> I'm thinking that it should throw an error. Same with IN, if the
> values in the IN-list can't be coerced to a common type. That will
> probably simplify the code a lot, and is what the SQL standard
> assumes anyway AFAICS.

+1 for more strict.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-04-11 16:20:54 Re: Re: [COMMITTERS] pgsql: Don't make "replication" magical as a user name, only as a datab
Previous Message Marc G. Fournier 2011-04-11 15:52:26 Re: developer.postgresql.org down