Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-06-26 09:31:34
Message-ID: CAApHDvra_691TU96ELAwtwHm+jLW7_HAwf7d58DkX2qx85wV7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 26, 2014 at 3:52 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > To be clearer, what I mean is we use only the direct proof approach,
> > for queries like this
>
> > SELECT * FROM a WHERE id NOT IN(SELECT unknown_col FROM b WHERE
> > unknown_col IS NOT NULL);
>
> > and we don't try to do it for queries like this
>
> > SELECT * FROM a WHERE id NOT IN(SELECT not_null_column FROM b);
>
> > because we don't know the full provenance of "not_null_column" in all
> > cases and that info is (currently) unreliable.
>
> FWIW, I think that would largely cripple the usefulness of the patch.
> If you can tell people to rewrite their queries, you might as well
> tell them to rewrite into NOT EXISTS. The real-world queries that
> we're trying to improve invariably look like the latter case not the
> former, because people who are running into this problem usually
> aren't even thinking about the possibility of NULLs.
>
>
At first I didn't quite agree with this, but after a bit more thought I'm
coming around to it.

I had been thinking that, quite often the subquery in the NOT IN would have
a WHERE clause and not just be accessing all rows of the table, but then
it's probably very likely that when the subquery *does* contain a WHERE
clause that it's for some completely different column.. It seems a bit
weird to write NOT IN(SELECT somecol FROM table WHERE somecol = 5) it's
probably more likely to be something like NOT IN(SELECT somecol FROM table
WHERE category = 5), i.e some column that's not in the target list, and in
this case we wouldn't know that somecol couldn't be NULL.

If there's no way to tell that the target entry comes from a left join,
then would it be a bit too quirky to just do the NOT NULL checking when
list_length(query->rtable) == 1 ? or maybe even loop over query->rtable and
abort if we find an outer join type? it seems a bit hackish, but perhaps it
would please more people than it would surprise.

Regards

David Rowley

> I would actually say that if we only have the bandwidth to get one of
> these cases done, it should be the second one not the first. It's
> unclear to me that checking for the first case would even be worth
> the planner cycles it'd cost.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip kumar 2014-06-26 09:35:28 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Previous Message Ian Barwick 2014-06-26 09:27:34 Re: pgaudit - an auditing extension for PostgreSQL