Re: BUG #14107: Major query planner bug regarding subqueries and indices

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Mathias Kunter <mathiaskunter(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14107: Major query planner bug regarding subqueries and indices
Date: 2016-05-11 13:56:24
Message-ID: CAKJS1f_798-+aewz9GS=OSYqk2mxQTuT7_9pWedSJbDoRzZnVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 11 May 2016 at 23:55, Mathias Kunter <mathiaskunter(at)gmail(dot)com> wrote:
> Sorry for bumping this one more time, but I'd like to share some more
> real-life performance test results of using ANY(ARRAY(...)) instead of
> IN(...), hoping that you'd maybe still consider implementing such an
> optimization into the query planner. Since the test results indicate that
> the performance boost can really be massive on certain query types (factor
> 1000), I think that it'd really be worth the work.
>
>
> ===== Test setup =====
>
> The tables "mb.release" and "mb.release_group" both contain about 1.5
> million rows of real data, taken from the MusicBrainz database, and are of
> course properly indexed. All performance tests have been repeated a few
> times to be comparable.
>
> The test covers subqueries which return just a few rows and also subqueries
> which return more than 100000 rows. The queries test the performance of IN
> vs. ANY(ARRAY()) when used in different scenarios.
>
> For reference, the full query plans of all used queries are linked below.
>
>
> ===== Tested queries =====
>
> 1) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
> mb.release_group WHERE name = 'Bear');
>
> 2) SELECT id FROM mb.release WHERE release_group IN (SELECT id FROM
> mb.release_group WHERE name < 'Bear');
>
> 3) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
> (SELECT id FROM mb.release_group WHERE name = 'Bear');
>
> 4) SELECT id FROM mb.release WHERE name = 'Tiger' OR release_group IN
> (SELECT id FROM mb.release_group WHERE name < 'Bear');
>
> 5) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
> (SELECT id FROM mb.release_group WHERE name = 'Bear');
>
> 6) SELECT id FROM mb.release WHERE name = 'Tiger' AND release_group IN
> (SELECT id FROM mb.release_group WHERE name < 'Bear');
>
>
> ===== Test results =====
>
> All numbers are given in milliseconds and show the total query time
> (planning + execution).
>
> -------------------------------------------
> | Query | IN (...) | = ANY(ARRAY(...)) |
> -------------------------------------------
> | 1 | 0.7 | 0.4 |
> | 2 | 6001.1 | 2517.8 |
> | 3 | 711.3 | 0.5 |
> | 4 | > 1000000.0 | 1962.6 |
> | 5 | 0.8 | 0.5 |
> | 6 | 0.9 | 492.7 |
> -------------------------------------------
>
> Note: Query 4 using the IN operator has been canceled after running for more
> than 15 minutes.

How do you find the ANY(ARRAY(...)) version performs with say 10
million records in the array?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-05-11 14:33:25 Re: BUG #14132: ON CONFLICT not inferring unique index with bigints
Previous Message Mathias Kunter 2016-05-11 11:55:07 Re: BUG #14107: Major query planner bug regarding subqueries and indices