Re: Transforming IN (...) to ORs, volatility

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transforming IN (...) to ORs, volatility
Date: 2011-04-05 10:19:00
Message-ID: BANLkTiknsvwjG6LTdxLMRB3o4xKwTjMm0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 1, 2011 at 14:24, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> We sometimes transform IN-clauses to a list of ORs:
>
> postgres=# explain SELECT * FROM foo WHERE a IN (b, c);
>                      QUERY PLAN
>  Seq Scan on foo  (cost=0.00..39.10 rows=19 width=12)
>   Filter: ((a = b) OR (a = c))
>
> But what if you replace "a" with a volatile function? It doesn't seem legal
> to do that transformation in that case, but we do it:
>
> postgres=# explain SELECT * FROM foo WHERE (random()*2)::integer IN (b, c);
>                                                     QUERY PLAN
>
>  Seq Scan on foo  (cost=0.00..68.20 rows=19 width=12)
>   Filter: ((((random() * 2::double precision))::integer = b) OR (((random()
> * 2::double precision))::integer = c))

Is there a similar problem with the BETWEEN clause transformation into
AND expressions?

marti=> explain verbose select random() between 0.25 and 0.75;
Result (cost=0.00..0.02 rows=1 width=0)
Output: ((random() >= 0.25::double precision) AND (random() <=
0.75::double precision))

As expected, I get a statistical skew of 0.4375 / 0.5625, whereas the
"correct" would be 50/50:

marti=> select random() between 0.25 and 0.75 as result, count(*) from
generate_series(1,1000000) i group by 1;
result | count
--------+--------
f | 437262
t | 562738

I also always noticed that BETWEEN with subqueries produces two
subplan nodes, this seems suboptimal.

marti=> explain verbose select (select random()) between 0.25 and 0.75;
Result (cost=0.03..0.04 rows=1 width=0)
Output: (($0 >= 0.25::double precision) AND ($1 <= 0.75::double precision))
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=0)
Output: random()
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=0)
Output: random()

Regards,
Marti

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-04-05 11:52:26 Re: Re: synchronous_commit and synchronous_replication Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication.
Previous Message Shigeru HANADA 2011-04-05 10:03:07 Re: Re: [COMMITTERS] pgsql: Support comments on FOREIGN DATA WRAPPER and SERVER objects.