Re: LIMIT causes SEQSCAN in subselect

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org, Mike Rylander <mrylander(at)gmail(dot)com>
Subject: Re: LIMIT causes SEQSCAN in subselect
Date: 2004-12-11 05:40:18
Message-ID: 200412102140.18456.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mike,

> I'm using the fact that Postgres can elevate a subselect-join to a
> simple join when there are no aggregates involved and I think I
> remember there has been some work recently on elevating subselects
> that contain a LIMIT, so I went back and ran the plans without the
> LIMITs to see what would happen. Well, the limit killed the subselect
> elevation.

Actually, this makes sense. A LIMIT requires the data to be ordered first,
and then cut based on the order; it prevents collapsing the subselect into
the main query. Some sort of materializing is necessary, even in cases like
yours where the limit is inherently meaningless because you've neglected to
use an ORDER BY.

The fact that the estimator knows that the LIMIT is pointless because there
are less rows in the subselect than the LIMIT will return is not something we
want to count on; sometimes the estimator has innaccurate information. The
UNIQUE index makes this more certain, except that I'm not sure that the
planner distinguishes between actual UNIQUE indexes and columns which are
estimated unique (per the pg_stats). And I think you can see in your case
that there's quite a difference between a column we're CERTAIN is unique,
versus a column we THINK is unique.

> I realize this is a rather specialized case and not really great form.

Exactly. You've grasped the main issue: that this has not been optimized
because it's bizarre and not very sensible query writing. Someday we'll get
around to optimizing the really wierd queries, but there's still a lot of
work to be done on the common ones (like count(*) ...).

Keep in mind that the only reason we support LIMIT inside subqueries in the
first place is a workaround to slow aggregates, and a way to do RANK. It's
certainly not SQL-standard.

> Just a matter of
> defining result sets independently, and creating a simple wrapper to
> join them.

Well, if you think so, you know where to submit patches ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2004-12-11 05:52:40 Re: Partitioned table performance
Previous Message Stephan Szabo 2004-12-11 02:28:38 Re: Query is not using index when it should