Re: IN list processing performance (yet again)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dave Tenny <tenny(at)attbi(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: IN list processing performance (yet again)
Date: 2003-05-28 23:44:01
Message-ID: 9207.1054165441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dave Tenny <tenny(at)attbi(dot)com> writes:
> My application relies heavily on IN lists. The lists are primarily
> constant integers, so queries look like:
> SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...)

> 1) PostgreSQL exhibits worse-than-linear performance behavior with
> respect to IN list size.

Yeah. There are a couple of places in the planner that have O(N^2)
behavior on sufficiently large WHERE clauses, due to building lists
in a naive way (repeated lappend() operations). The inner loop is
very tight, but nonetheless when you do it tens of millions of times
it adds up :-(

I have just committed some fixes into CVS tip for this --- I see about
a 10x speedup in planning time on test cases involving 10000-OR-item
WHERE clauses. We looked at this once before; the test cases I'm using
actually date back to Jan 2000. But it seems some slowness has crept
in due to subsequent planning improvements.

> 4) God help you if you haven't vacuum/analyzed that the newly loaded
> table.

Without knowledge that the id field is unique, the planner is likely to
tilt away from an indexscan with not too many IN items. I don't
consider this a bug.

> PostgreSQL craps out trying to process 8000 elements with the error:
> out of free buffers: time to abort!

This is a known bug in 7.3.2; it's fixed in 7.3.3.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Tenny 2003-05-29 01:19:56 Re: IN list processing performance (yet again)
Previous Message scott.marlowe 2003-05-28 22:46:26 Re: Wildcard searches & performance question