Re: Improving NOT IN

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving NOT IN
Date: 2007-01-30 22:55:29
Message-ID: 1170197729.3681.301.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2007-01-30 at 17:34 -0500, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > First we need to show that the referenced table's PK values are a fully
> > continuous sequence of integers with no gaps.
>
> Since that is unlikely to be the case, I can't see that this is worth
> implementing...

Integers are typically used as keys...

> > I'll describe this using SQL statements, which execute as SeqScans of
> > the PK and then FK tables. There is no preparatory step - no building a
> > sort table or preparing a hash table, so these SQL statements always
> > execute faster than the fastest current plan.
>
> Except that when you fail to prove it, as you usually will, you have
> wasted a complete seqscan of the table, and still have to fall back on
> a regular plan. If the thing were amenable to falling out fairly
> quickly on proof failure, it would be better, but AFAICS you don't know
> anything until you've completed the whole scan.

Have some faith, please. It's fairly straightforward to make an estimate
of whether the number of rows is approximately correct to make the scan
worthwhile. On large queries it seems worth the risk; we might even
store the answer as part of stats, so we'd know not to bother with the
test in the future.

> BTW, your sketch fails in the presence of NULLs on the RHS ...

Certainly does, but the typical query has PK there, so no NULLs. One of
the main use cases is the ALTER TABLE ... ADD FK case. As I said, we
could just code that with altered SQL, or we could add a new plan.

Anyway, it seemed like the right time to log the thought anyhow.

> I think the NOT IN optimization that *would* be of use is to
> automatically transform the NOT IN representation to an
> outer-join-with-null-test type of operation, so as to give us a wider
> choice of join methods. However, I'm not sure about correct handling
> of NULLs on the RHS in such a scenario. The existing hashed-IN code
> has to jump through some really ugly hoops to give spec-compliant
> answers with NULLs.

Yeh, NOT IN with NULLs is..... bizarre.

What would be wrong with checking for a NOT NULL constraint? Thats how
other planners cope with it. Or are you thinking about lack of plan
invalidation?

ISTM straightforward to do a search for a ANDed set of IS NOT NULL
constraints. I've not found another server that does that, even though
it seems like a straightforward win.

Let me think on that.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-01-30 23:06:11 Re: Improving NOT IN
Previous Message David Fetter 2007-01-30 22:41:12 Re: Talks for OSCON? Only 5 days left!