Re: IN list processing performance (yet again)

From: Dave Tenny <tenny(at)attbi(dot)com>
To: Andreas Pflug <Andreas(dot)Pflug(at)web(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: IN list processing performance (yet again)
Date: 2003-05-28 18:08:02
Message-ID: 3ED4FB02.1010908@attbi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andreas Pflug wrote:

> Dave Tenny wrote:
>
>> Having grepped the web, it's clear that this isn't the first or last
>> time this issue will be raised.
>>
>> My application relies heavily on IN lists. The lists are primarily
>> constant integers, so queries look like:
>>
>> SELECT val FROM table WHERE id IN (43, 49, 1001, 100002, ...)
>>
>> Performance is critical, and the size of these lists depends a lot on
>> how the larger 3-tier applicaiton is used,
>> but it wouldn't be out of the question to retrieve 3000-10000 items.
>>
>> PostgreSQL 7.3.2 seems to have a lot of trouble with large lists.
>> I ran an experiment that ran queries on a table of two integers (ID,
>> VAL), where ID is a primary key and the subject
>> of IN list predicates. The test used a table with one million rows
>> ID is appropriately indexed,
>> and I have VACUUMED/analyzed the database after table load.
>>
>> I ran tests on in-lists from about 100 to 100,000 entries.
>
>
> Hi Dave,
>
> it sounds as if that IN-list is created by the application. I wonder
> if there are really so many variances and combinations of it or
> whether you could invent an additional column, which groups all those
> individual values. If possible, you could reduce your IN list to much
> fewer values, and probably would get better performance (using an
> index on that col, of course).

There are over 50 tables in the schema,
and dozens of client commands that manipulate the schema in a
persistent-checkout kind of way over time, as well as spurious reporting
requests
that require incredibly complex filtering and combination of data from
many tables.
I'm pretty much up to my keister data (and am resisting impulses for
denormalization), so this approach
probably isn't viable for me. Now I *could* create a temporary table
with the group of values, but I suspect the cost of that
substantially outweighs the negative performance of current IN lists or
parameterized statements.

I'm reminded to relay to the PostgreSQL devos that I might be able to do
more in the join or subquery department if
PostgreSQL had better performing MAX functions and a FIRST function for
selecting rows from groups.
("Performing" being the operative word here, since the extensible
architecture of PostgreSQL currently makes for poorly
performing MAX capabilities and presumably similar user defined
aggregate functions).

>
> Regards,
>
> Andreas
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2003-05-28 18:14:03 Re: >24 hour restore
Previous Message Chad Thompson 2003-05-28 17:59:49 Re: >24 hour restore