Skip site navigation (1) Skip section navigation (2)

Re: IN list processing performance (yet again)

From: Dave Tenny <tenny(at)attbi(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: IN list processing performance (yet again)
Date: 2003-05-29 01:19:56
Message-ID: 3ED5603C.2020800@attbi.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:

>Dave Tenny <tenny(at)attbi(dot)com> writes:
>  
>
>>My application relies heavily on IN lists.  The lists are primarily 
>>constant integers, so queries look like:
>>SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...)
>>    
>>
>
>  
>
>>1) PostgreSQL exhibits worse-than-linear performance behavior with 
>>respect to IN list size.
>>    
>>
>
>Yeah.  There are a couple of places in the planner that have O(N^2)
>behavior on sufficiently large WHERE clauses, due to building lists
>in a naive way (repeated lappend() operations).  The inner loop is
>very tight, but nonetheless when you do it tens of millions of times
>it adds up :-(
>
>  
>
It was also showing up very clearly in the timings I attached for just a 
couple hundred IN list entries too.
Does that mean that the time for the O(1) portions of the logic are 
perhaps also in need of a tuneup?
(I would think O(N^2) for trivial operations on a fast machine wouldn't 
manifest quite so much
for smallish N).

>I have just committed some fixes into CVS tip for this --- I see about
>a 10x speedup in planning time on test cases involving 10000-OR-item
>WHERE clauses.  We looked at this once before; the test cases I'm using
>actually date back to Jan 2000.  But it seems some slowness has crept
>in due to subsequent planning improvements.
>
>  
>
So what version might that equate to down the road, so I can be sure to 
check it out?
I'm afraid I'm not up to testing CVS tips.

>  
>
>>4)  God help you if you haven't vacuum/analyzed that the newly loaded 
>>table.
>>    
>>
>
>Without knowledge that the id field is unique, the planner is likely to
>tilt away from an indexscan with not too many IN items.  I don't
>consider this a bug.
>
>  
>
There is one very interesting thing in my test case though.  It 
certainly /seemed/ as if the
parameterized statements were successfully using the index of the 
freshly-created-but-unanalyzed table,
or else the times on those queries would have been terrible too.  It was 
only the IN list form
of query that wasn't making correct use of the index.  How can the 
planner recognize uniqueness for
one case but not the other? (Since I ran both forms of query on the same 
table with and without vacuuming).

>  
>
>>      PostgreSQL  craps out trying to process 8000 elements with the error:
>>      out of free buffers: time to abort!    
>>    
>>
>
>This is a known bug in 7.3.2; it's fixed in 7.3.3.
>  
>
Thanks, that's good to know.

Dave

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-05-29 01:46:13
Subject: Re: IN list processing performance (yet again)
Previous:From: Tom LaneDate: 2003-05-28 23:44:01
Subject: Re: IN list processing performance (yet again)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group