Re: NOT IN subquery optimization

From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: "Li, Zheng" <zhelli(at)amazon(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2020-01-05 06:11:19
Message-ID: 0c616c5c-8d54-5a22-10e8-ac3336eff426@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At the top of the thread your co-author argued the beginning of this
work with "findings about the performance of PostgreSQL, MySQL, and
Oracle on various subqueries:"

https://antognini.ch/2017/12/how-well-a-query-optimizer-handles-subqueries/

I launched this test set with your "not_in ..." patch. Your optimization
improves only results of D10-D13 queries. Nothing has changed for bad
plans of the E20-E27 and F20-F27 queries.

For example, we can replace E20 query:
SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u =
large.u); - execution time: 1370 ms, by
SELECT * FROM large WHERE EXISTS (SELECT n,u FROM small WHERE (small.u =
large.u) AND (large.n = small.n
)) AND n IS NOT NULL; - execution time: 0.112 ms

E21 query:
SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u =
large.u); - 1553 ms, by
SELECT * FROM large WHERE EXISTS (SELECT nn FROM small WHERE (small.u =
large.u) AND (small.nn = large.n)); - 0.194 ms

F27 query:
SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu
= large.u); - 1653.048 ms, by
SELECT * FROM large WHERE NOT EXISTS (SELECT nn,nu FROM small WHERE
(small.nu = large.u) AND (small.nn = large.nn)); - 274.019 ms

Are you planning to make another patch for these cases?

Also i tried to increase work_mem up to 2GB: may be hashed subqueries
can improve situation? But hashing is not improved execution time of the
queries significantly.

On your test cases (from the comments of the patch) the subquery hashing
has the same execution time with queries No.13-17. At the queries
No.1-12 it is not so slow as without hashing, but works more slowly (up
to 3 orders) than NOT IN optimization.

On 12/2/19 9:25 PM, Li, Zheng wrote:
> Here is the latest rebased patch.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-01-05 08:53:43 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Amit Kapila 2020-01-05 04:59:52 Re: logical decoding : exceeded maxAllocatedDescs for .spill files