Performance improvement for queries with IN clause

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Performance improvement for queries with IN clause
Date: 2019-11-08 13:52:12
Message-ID: CA+FpmFeUTPQ2RnBO-e2OHxxi4imjH++gjq2tXN1KQA0uw-5jUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

I would like to direct your attention to the queries of following type,
select <some_column(s)>
from <table_name>
where <some_column> IN (<a_list_of_some_values>)

the plan for such a query uses index scan (or index-only), now in our
experiments, if the provided list is sorted then query performance
improves by ~10%. Which makes sense also as once we have found the required
btree leaf we just keep moving in one direction, which should be
expectantly less time consuming than searching the tree again.

Now, my question is shouldn't we always use this list in sorted order, in
other words can there be scenarios where such a sorting will not help? I am
talking about only the cases where the list consists of all constants and
could fit in memory. Basically, when we are transforming the in expression
and found that it consists of all constants, then sort it as well, codewise
at transfromAExprIn, of course there might be better ways to accomplish
this.

So, your thoughts, opinions, suggestions are more than welcome.

--
Regards,
Rafia Sabih

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-11-08 13:55:25 Re: Monitoring disk space from within the server
Previous Message Julien Rouhaud 2019-11-08 13:48:58 Re: Monitoring disk space from within the server