Re: Thinking about IN/EXISTS optimization

From: Mike Benoit <mikeb(at)netnation(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Thinking about IN/EXISTS optimization
Date: 2002-11-29 21:16:27
Message-ID: 1038604586.2919.21.camel@mikeb.staff.netnation.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

I'm just curious, will your proposed in/exists optimizations help for
queries like:

db=# explain delete from dns_expired_domains where domain_id in (select
domain_id from dns_expired_domains group by domain_id having count(*)=14
);
NOTICE: QUERY PLAN:

Seq Scan on dns_expired_domains (cost=0.00..55448724329.92 rows=324754
width=6)
SubPlan
-> Materialize (cost=85370.33..85370.33 rows=64951 width=4)
-> Aggregate (cost=82122.79..85370.33 rows=64951 width=4)
-> Group (cost=82122.79..83746.56 rows=649508 width=4)
-> Sort (cost=82122.79..82122.79 rows=649508
width=4)
-> Seq Scan on dns_expired_domains
(cost=0.00..10316.08 rows=649508 width=4)

EXPLAIN

I usually end up having to make a little script that runs the subquery,
splits the domain_id's up in to chunks of 1000 or so, then executes
several queries similar to:

delete from dns_expired_domains where domain_id in
(1,2,3,4,5,6,7,8,9,10...)

This method seems to work fairly well and executes in a reasonable
amount of time, unlike the original query with an estimated cost of
55,448,724,329.92. I attempted to use EXISTS in the same query but it
seemed it wanted to delete all the rows in the table, I wasn't able to
get it to delete only the ones that occured 14 times in the table. I may
have overlooked something though.

In any case, it would definately be nice if a query like this worked
efficiently.

Thanks, and congrats to all the people involved with the 7.3 release,
all your hardwork is greatly appreciated.

On Tue, 2002-10-22 at 16:18, Tom Lane wrote:
> I've been thinking about how to convert "x IN (subselect)" and EXISTS
> constructs into join-like processing, and I've run into a small problem
> in getting the planner to do it nicely. The issue is that I need to
> take the subselect and push it into the jointree -- essentially, make
> it look like a subselect-in-FROM -- so that the join planner can deal
> with it. Basically, I need to rearrange
>
> SELECT ... FROM ... WHERE ... AND x IN (SELECT y FROM ...)
>
> into
>
> SELECT ... FROM ..., (SELECT y FROM ...) ss
> WHERE ... AND x =* ss.y
>
> where =* represents some specially-marked RestrictInfo node. (NOT IN is the
> same except that the RestrictInfo node will be marked differently.)
>
> The difficulty is that there's no good place to do this in
> subquery_planner(). We should push the subselect into FROM before we
> run the pull_up_subqueries() and preprocess_jointree() operations;
> if we don't pull up the subselect into the main query then we won't have
> accomplished very much. But the WHERE clause isn't simplified into a
> form that makes it easy to spot top-level IN() expressions until after
> that. We can't simply switch the order of the subselect and
> WHERE-clause processing, because pulling up subqueries typically adds
> conditions to the WHERE clause.
>
> I haven't been able to think of a solution to this that doesn't involve
> wasting a lot of cycles by repeating some of these processing steps,
> or missing some optimization possibilities. (For example, if we pull up
> a subquery that came from a view, it might contain an IN where-clause,
> which ideally we'd want to be able to optimize. It almost seems like
> we need to be able to loop around the whole operation; but most of the
> time this will just waste cycles.)
>
> Anyone see a nice way to do this?
>
> regards, tom lane
>
> --(end of broadcast)--
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Best Regards,

Mike Benoit
NetNation Communication Inc.
Systems Engineer
Tel: 604-684-6892 or 888-983-6600
--

Disclaimer: Opinions expressed here are my own and not
necessarily those of my employer

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Scott Lamb 2002-11-29 21:19:47 Re: Postgres 7.3 announcement on postgresql.org
Previous Message Bruce Momjian 2002-11-29 21:04:54 Re: nested transactions