Re: BUG #14107: Major query planner bug regarding subqueries and indices

From: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14107: Major query planner bug regarding subqueries and indices
Date: 2016-05-11 11:55:07
Message-ID: fad91c29-5250-f856-2347-61122d4901ab@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sorry for bumping this one more time, but I'd like to share some more
real-life performance test results of using ANY(ARRAY(...)) instead of
IN(...), hoping that you'd maybe still consider implementing such an
optimization into the query planner. Since the test results indicate
that the performance boost can really be massive on certain query types
(factor 1000), I think that it'd really be worth the work.

===== Test setup =====

The tables "mb.release" and "mb.release_group" both contain about 1.5
million rows of real data, taken from the MusicBrainz database, and are
of course properly indexed. All performance tests have been repeated a
few times to be comparable.

The test covers subqueries which return just a few rows and also
subqueries which return more than 100000 rows. The queries test the
performance of IN vs. ANY(ARRAY()) when used in different scenarios.

For reference, the full query plans of all used queries are linked below.

===== Tested queries =====

1) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
mb.release_group WHERE name = 'Bear');

2) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
mb.release_group WHERE name < 'Bear');

3) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
(SELECT id FROM mb.release_group WHERE name = 'Bear');

4) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
(SELECT id FROM mb.release_group WHERE name < 'Bear');

5) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
(SELECT id FROM mb.release_group WHERE name = 'Bear');

6) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
(SELECT id FROM mb.release_group WHERE name < 'Bear');

===== Test results =====

All numbers are given in milliseconds and show the total query time
(planning + execution).

-------------------------------------------
| Query | IN (...) | = ANY(ARRAY(...)) |
-------------------------------------------
| 1 | 0.7 | 0.4 |
| 2 | 6001.1 | 2517.8 |
| 3 | 711.3 | 0.5 |
| 4 | > 1000000.0 | 1962.6 |
| 5 | 0.8 | 0.5 |
| 6 | 0.9 | 492.7 |
-------------------------------------------

Note: Query 4 using the IN operator has been canceled after running for
more than 15 minutes.

===== Full query plans =====

For reference, all query plans of this performance test have been
recorded using EXPLAIN (ANALYZE, BUFFERS). Please find them at
http://pastebin.com/zymkbcSf

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2016-05-11 13:56:24 Re: BUG #14107: Major query planner bug regarding subqueries and indices
Previous Message Hari Prasad 2016-05-11 09:48:08 Installation error/bug