Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?

From: geirB <geir(dot)bostad(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Date: 2012-05-23 12:07:04
Message-ID: d4f2a2d4-0938-4b5c-8d4f-e65d8abf8bf6@q2g2000vbv.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I've encoutered similar issues myself (with UNION so far), so I tried
to build a simple test case, which may or may not cover Clemens's
case.

Test case 1 and 2 illustrates the issue, and case 3-9 are variations.

My observation: Looks like the optimizer cannot be close friends with
both UNION and IN/JOIN at the same time.
Actually - it looks like the UNION SELECT kids don't wanna share the
IN/JOIN toy we gave them, but are happy when they get their own toys
to play with ;)

DROP TABLE IF EXISTS table1;
CREATE TABLE table1 AS SELECT i AS id FROM generate_series(1, 300000)
S(i);
CREATE INDEX ON table1(id);
ANALYZE table1;

-- Test 1: Slow. IN()
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q WHERE id IN (SELECT id FROM table1 LIMIT 10);

-- Test 2: Fast. ANY(ARRAY())
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q WHERE id = ANY(ARRAY(SELECT id FROM table1 LIMIT 10));

-- Test 3: Fast. Duplicate IN. Symptom fix? Or would you call it a
"better" query in terms of sql? -except for the unnecessary subquery,
which I kept for readability.
SELECT * FROM (
SELECT * FROM table1
WHERE id IN (SELECT id FROM table1 LIMIT 10)
UNION
SELECT * FROM table1
WHERE id IN (SELECT id FROM table1 LIMIT 10)
) Q;

-- Test 4: Fast. Duplicate JOIN CTE.
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
SELECT * FROM table1 JOIN id_list USING(id)
UNION
SELECT * FROM table1 JOIN id_list USING(id)
) Q;

-- Test 5: Slow. IN(CTE)
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q WHERE id IN (SELECT * FROM id_list);

-- Test 6: Slow. IN(explicit id list)
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q WHERE id IN (SELECT
UNNEST('{100001,100002,100003,100004,100005,100006,100007,100008,100009,10010}'::BIGINT[] )
AS id);

-- Test 7: Slow. IN(UNNEST(ARRAY())
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q WHERE id IN (SELECT UNNEST(ARRAY(SELECT id FROM table1 LIMIT 10))
AS id);

-- Test 8: Slow. JOIN CTE
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT * FROM (
SELECT * FROM table1
UNION
SELECT * FROM table1
) Q JOIN id_list USING(id);

-- Test 9: Fast. JOIN CTE + UNION ALL/DISTINCT (not quite the same
query)
WITH id_list AS (SELECT id FROM table1 LIMIT 10)
SELECT DISTINCT * FROM (
SELECT * FROM table1
UNION ALL
SELECT * FROM table1
) Q JOIN id_list USING(id);

--
Geir Bostad
9.1.3(x64,win)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message geirB 2012-05-23 12:23:01 Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Previous Message Shaun Thomas 2012-05-21 15:50:04 Re: local-storage versus SAN sequential read performance comparison