Re: Performance of count(*)

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Steve Atkins <steve(at)blighty(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of count(*)
Date: 2007-03-22 19:18:24
Message-ID: 4602D680.2020106@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> "Craig A. James" <cjames(at)modgraph-usa(dot)com> writes:
>> Steve Atkins wrote:
>>> As long as you're ordering by some row in the table then you can do that in
>>> straight SQL.
>>>
>>> select a, b, ts from foo where (stuff) and foo > X order by foo limit 10
>>>
>>> Then, record the last value of foo you read, and plug it in as X the next
>>> time around.
>
>> We've been over this before in this forum: It doesn't work as advertised.
>> Look for postings by me regarding the fact that there is no way to tell
>> the optimizer the cost of executing a function. There's one, for example,
>> on Oct 18, 2006.
>
> You mean
> http://archives.postgresql.org/pgsql-performance/2006-10/msg00283.php
> ? I don't see anything there that bears on Steve's suggestion.
> (The complaint is obsolete as of CVS HEAD anyway.)

Mea culpa, it's October 8, not October 18:

http://archives.postgresql.org/pgsql-performance/2006-10/msg00143.php

The relevant part is this:

"My example, discussed previously in this forum, is a classic. I have a VERY expensive function (it's in the class of NP-complete problems, so there is no faster way to do it). There is no circumstance when my function should be used as a filter, and no circumstance when it should be done before a join. But PG has no way of knowing the cost of a function, and so the optimizer assigns the same cost to every function. Big disaster.

"The result? I can't use my function in any WHERE clause that involves any other conditions or joins. Only by itself. PG will occasionally decide to use my function as a filter instead of doing the join or the other WHERE conditions first, and I'm dead.

"The interesting thing is that PG works pretty well for me on big tables -- it does the join first, then applies my expensive functions. But with a SMALL (like 50K rows) table, it applies my function first, then does the join. A search that completes in 1 second on a 5,000,000 row database can take a minute or more on a 50,000 row database."

Craig

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-03-22 19:40:33 Re: Performance of count(*)
Previous Message Michael Stone 2007-03-22 18:46:15 Re: Parallel Vacuum