Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-07-16 09:26:07
Message-ID: CAApHDvrJrz-0xinyiqTiWs0mFX17GWD2Y8VZ+i92nuZsha8ocw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 16, 2014 at 9:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 15 July 2014 12:58, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> >> I found that the call to is_NOTANY_compatible_with_antijoin adds about
> 0.2%
> >> and 2.3% to total planning time. Though the 2.3% was quite an extreme
> case,
> >> and the 0.2% was the most simple case I could think of.
>
> > Is there a way we can only run this extra test when we have reasonable
> > idea that there is potential to save significant costs?
>
> Well, all of this cost occurs only when we find a NOT IN clause in a place
> where we could conceivably turn it into an antijoin. I think it's
> unquestionably a win to make that transformation if possible. My concern
> about it is mainly that the whole thing is a band-aid for naively written
> queries, and it seems likely to me that naively written queries aren't
> going to be amenable to making the proof we need. But we can't tell that
> for sure without doing exactly the work the patch does.
>
>
I do think Simon has a good point, maybe it's not something for this patch,
but perhaps other planner patches that potentially optimise queries that
could have been executed more efficiently if they had been written in
another way.

Since the planning time has been added to EXPLAIN ANALYZE I have noticed
that in many very simple queries that quite often planning time is longer
than execution time, so I really do understand the concern that we don't
want to slow the planner down for these super fast queries. But on the
other hand, if this extra 1-2 microseconds that the NOT IN optimisation was
being frowned upon and the patch had been rejected based on that, at the
other end of the scale, I wouldn't think it was too impossible for spending
that extra 2 microseconds to translate into shaving a few hours off of the
execution time of a query being run in an OLAP type workload. If that was
the case then having not spent the 2 extra microseconds seems quite funny,
but there's no real way to tell I guess unless we invented something to
skip the known costly optimisations on first pass then re-plan the whole
query with the planning strength knob turned to the maximum if the final
query cost more than N.

Off course such a idea would make bad plans even harder to debug, so it's
far from perfect, but I'm not seeing other options for the best of both
worlds.

Regards

David Rowley

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-07-16 10:03:39 Re: better atomics - v0.5
Previous Message Viswanatham kirankumar 2014-07-16 09:23:01 [TODO] Process pg_hba.conf keywords as case-insensitive