Re: [PATCH] Simplify EXISTS subqueries containing LIMIT

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Simplify EXISTS subqueries containing LIMIT
Date: 2014-10-22 10:37:03
Message-ID: CAApHDvqV-Mk_XMdqRavbdQm7bEo_8yx-LWf2Pbqjxp+VpAQ_Mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 21, 2014 at 11:00 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> On Sun, Oct 19, 2014 at 1:22 PM, David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> > the argument for this would
> > have been much stronger if anti join support had just been added last
> week.
> > It's been quite a few years now and the argument for this must be getting
> > weaker with every release.
>
> I see your point, but I would put it another way: we've had this for a
> few years, but people haven't learned and are *still* using LIMIT 1.
>
>
I've had a bit of a look at this and here's a couple of things:

/*
+ * LIMIT clause can be removed if it's a positive constant or ALL,
to
+ * prevent it from being an optimization barrier. It's a common
meme to put
+ * LIMIT 1 within EXISTS subqueries.
+ */

I think this comment may be better explained along the lines of:

"A subquery which has a LIMIT clause with a positive value is effectively a
no-op in this scenario. With EXISTS we only care about the first row
anyway, so any positive limit value will have no behavioral change to the
query, so we'll simply remove the LIMIT clause. If we're unable to prove
that the LIMIT value is a positive number then we'd better not touch it."

+ /* Checking for negative values is done later; 0 is just silly */
+ if (! limit->constisnull && DatumGetInt64(limit->constvalue) <= 0)

I'm a bit confused by the comment here. You're saying that we'll check for
negatives later... but you're checking for <= 0 on the next line... Did I
miss something or is this a mistake?

This test:

+select * from int4_tbl o where exists (select 1 limit 0);
+ f1
+----
+(0 rows)

I guess here you're just testing to ensure that you've not broken this and
that the new code does not accidentally remove the LIMIT clause. I think it
also might be worth adding some tests to ensure that co-related EXISTS
clauses with a positive LIMIT value get properly changed into a SEMI JOIN
instead of remaining as a subplan. And also make sure that a LIMIT 0 or
LIMIT -1 gets left as a subplan. I'd say such a test would supersede the
above test, and I think it's also the case you were talking about
optimising anyway?

You can use EXPLAIN (COSTS OFF) to get a stable explain output.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2014-10-22 10:55:06 btree_gin and ranges
Previous Message Florian Pflug 2014-10-22 10:31:12 Re: Question about RI checks