Re: LIMIT causes SEQSCAN in subselect

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Mike Rylander <mrylander(at)gmail(dot)com>
Subject: Re: LIMIT causes SEQSCAN in subselect
Date: 2004-12-11 18:06:23
Message-ID: 17528.1102788383@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> The fact that the estimator knows that the LIMIT is pointless because there
> are less rows in the subselect than the LIMIT will return is not something we
> want to count on; sometimes the estimator has innaccurate information.

However, when the estimator is basing that estimate on the existence of
a unique index for the column, the estimate could be trusted. There are
a couple of reasons that we don't perform that optimization at present,
though:

1. If the finished query plan doesn't actually *use* the index in
question, then dropping the index would not directly invalidate the
query plan, but nonetheless the query would be broken. You could
subsequently get silently-wrong answers.

2. For the particular point at hand, there's an implementation problem,
which is that decisions about whether to flatten subqueries are taken
before we do any rowcount estimation. So even if we discarded the LIMIT
clause once we realized it was redundant, it'd be too late to get the
optimal overall plan.

Point #1 is something I would like to fix whenever we get around to
implementing proper invalidation of cached plans. There would need to
be a way to list "indirect" as well as direct dependencies of a plan.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas =?iso-8859-1?q?Sk=E4re?= 2004-12-12 08:32:25 Re: [GENERAL] Query is not using index when it should
Previous Message Alvaro Nunes Melo 2004-12-11 17:01:24 Very different index usage on similar tables