Re: Perfomance of IN-clause with many elements and possible solutions

From: PT <wmoran(at)potentialtech(dot)com>
To: Dmitry Lazurkin <dilaz03(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Perfomance of IN-clause with many elements and possible solutions
Date: 2017-07-24 21:17:59
Message-ID: 20170724171759.43a96f626de6962d37a62ad9@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 24 Jul 2017 13:17:56 +0300
Dmitry Lazurkin <dilaz03(at)gmail(dot)com> wrote:

> On 07/24/2017 01:40 AM, PT wrote:
> > In this example you count approximately 40,000,000 values, which is
> > about 40% of the table.
>
> 4 000 000 (:
>
> > If you really need these queries to be faster, I would suggest
> > materializing the data, i.e. create a table like:
> >
> > CREATE TABLE id_counts (
> > id BIGINT PRIMARY KEY,
> > num BIGINT
> > )
> >
> > Then use a trigger or similar technique to keep id_counts in sync
> > with the id table. You can then run queries of the form:
> >
> > SELECT sum(num) FROM id_counts WHERE id IN :values:
> >
> > which I would wager houseboats will be significantly faster.
> I use count only for example because it uses seqscan. I want optimize
> IN-clause ;-).

The IN clause is not what's taking all the time. It's the processing of
millions of rows that's taking all the time.

Perhaps you should better describe what it is you really want to accomplish.
Regardless of what it is, if it involves processing many millions of rows,
you're probably going to need to do some sort of materialization.

--
PT <wmoran(at)potentialtech(dot)com>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2017-07-24 21:31:53 Re: Perfomance of IN-clause with many elements and possible solutions
Previous Message Jeff Janes 2017-07-24 19:08:59 Re: Monitoring of a hot standby with a largely idle master