Re: [SQL] "SELECT IN" Still Broken in 7.4b

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Dani Oderbolz <oderbolz(at)ecologic(dot)de>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [SQL] "SELECT IN" Still Broken in 7.4b
Date: 2003-08-21 20:42:20
Message-ID: 25062.1061498540@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:
> Within the scope of the new hashed IN stuff I believe so in at least some
> cases. I have a few million row table of integers where searching for
> values IN (~10000 values) takes longer than creating the temp table,
> copying into it and doing the in subquery.

I did some profiling and soon realized that the main problem is the
executor's trick for not returning the same row multiple times in a
multiple-indexscan plan node. The point is that given
WHERE a = 1 OR b = 1
you could create a plan that first indexscans on a, then indexscans on
b --- but you mustn't return any tuples in the second scan that you
already returned in the first. IndexNext solves this by evaluating the
prior-scan index conditions to see if they are true. Which is okay if
there aren't too many of them. But when you have an N-element IN list
this means you are going to do O(N^2) index expression evaluations.
In the 10000-element IN-list test case, ExecQual gets called almost
50 million times :-(

I'm toying with the notion of ripping out that logic and instead
building an in-memory hashtable of already-returned TIDs. This could
theoretically run out of memory if the multiple indexscan returns enough
tuples, but I think in practice that wouldn't happen because the planner
wouldn't choose an indexscan when very large numbers of tuples are being
selected.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Edmund Dengler 2003-08-21 20:42:24 Re: Buglist
Previous Message Ian Barwick 2003-08-21 20:28:52 Re: [GENERAL] Need concrete "Why Postgres not MySQL" bullet list

Browse pgsql-sql by date

  From Date Subject
Next Message Stephan Szabo 2003-08-21 21:07:08 Re: [SQL] "SELECT IN" Still Broken in 7.4b
Previous Message Stephan Szabo 2003-08-21 17:28:53 Re: "SELECT IN" Still Broken in 7.4b