Re: Query optimization using order by and limit

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Michael Viscuso <michael(dot)viscuso(at)getcarbonblack(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndQuadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query optimization using order by and limit
Date: 2011-09-22 14:53:19
Message-ID: 20110922145319.GJ12765@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Michael Viscuso (michael(dot)viscuso(at)getcarbonblack(dot)com) wrote:
> Adding the final condition hosts_guid = '2007075705813916178' is what
> ultimately kills it http://explain.depesz.com/s/8zy. By adding the
> host_guid, it spends considerably more time in the older tables than
> without this condition and I'm not sure why.

What I think is happening here is that PG is pushing down that filter
(not typically a bad thing..), but with that condition, it's going to
scan the index until it finds a match for that filter before returning
back up only to have that result cut out due to the limit. Having it as
numerics isn't helping here, but the bigger issue is having to check all
those tuples for a match to the filter.

Mike, the filter has to be applied before the order by/limit, since
those clauses come after the filter has been applied (you wouldn't want
a 'where x = 2 limit 10' to return early just because it found 10
records where x didn't equal 2).

What would be great is if PG would realize that the CHECK constraints
prevent earlier records from being in these earlier tables, so it
shouldn't need to consider them at all once the records from the
'latest' table has been found and the limit reached (reverse all this
for an 'ascending' query, of course), which we can do when there's no
order by. I don't believe we have that kind of logic or that
information available at this late stage- the CHECK constraints are used
to eliminate the impossible-to-match tables, but that's it.

One option, which isn't great of course, would be to implement your own
'nested loop' construct (something I typically despise..) in the
application which just walks backwards from the latest and pulls
whatever records it can from each day and then stops once it hits the
limit.

Thanks,

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jonathan Bartlett 2011-09-22 16:40:46 Optimizing Trigram searches in PG 9.1
Previous Message Michael Viscuso 2011-09-22 13:55:28 Re: Query optimization using order by and limit