Quick Links

Planning large IN lists

From:	Neil Conway <neilc(at)samurai(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Planning large IN lists
Date:	2007-05-10 18:20:26
Message-ID:	1178821226.6034.63.camel@goldbach
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

When planning queries with a large IN expression in the WHERE clause,
the planner transforms the IN list into a scalar array expression. In
clause_selectivity(), we estimate the selectivity of the ScalarArrayExpr
by calling scalararraysel(), which in turn estimates the selectivity of
*each* array element in order to determine the selectivity of the array
expression as a whole.

This is quite inefficient when the IN list is large. In a test case that
someone sent me privately, a simple query involving two cheap joins and
a ~1800 element IN list in the WHERE clause requires about 100ms to plan
but only ~10 ms to execute -- about 85% of the total runtime is spent in
scalararraysel(). (I'd include the profiling data, but KCacheGrind seems
stubbornly opposed to providing a textual summary of its results...)

Clearly, the current approach is fine when the array is small -- perhaps
for arrays above a certain number of elements, we could switch to
randomly sampling array elements, estimating their selectivities, and
then using that information to infer the estimated selectivity of the
entire array expression. That seems fairly crude, though: does anyone
have any better ideas?

-Neil

Responses

Re: Planning large IN lists at 2007-05-10 18:42:40 from Lukas Kahwe Smith
Re: Planning large IN lists at 2007-05-10 18:53:28 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	CK Tan	2007-05-10 18:27:00	Re: Seq scans roadmap
Previous Message	CK Tan	2007-05-10 18:12:23	Re: Seq scans roadmap