Re: Question on COUNT performance

From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question on COUNT performance
Date: 2010-07-14 13:30:39
Message-ID: 4c3dbc01.0350e70a.18fa.388a@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, Jul 14, 2010 at 02:30:29PM +0200, Reinoud van Leeuwen wrote:
> On Wed, Jul 14, 2010 at 09:58:10PM +1000, Anders ??stergaard Jensen wrote:
> > SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
> > plan_events.id))
> >
> > QUERY PLAN
> >
> > ------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=2859.77..2859.78 rows=1 width=0) (actual
> > time=4641.720..4641.720 rows=1 loops=1)
> > -> Seq Scan on plan_events (cost=0.00..2851.44 rows=3331 width=0)
> > (actual time=32.821..4640.116 rows=2669 loops=1)
> > Filter: f_plan_event_acl(17, id)
> > Total runtime: 4641.753 ms
> > (4 rows)
> >
> >
> >
> > What can I do to improve the performance?
>
> Have you tried 'select count (1)..."?

If this helps at all, it's unlikely to help much. I remember having seen
discussion somewhere that there's an optimization such that count(*) and
count(1) do the same thing anyway, but I can't find it in the code
immediately. In any case, if your WHERE clause frequently includes this
function with 17 and id as arguments, and if f_plan_event_acl is immutable,
you can create an index:

CREATE INDEX foo_ix ON plan_events (f_plan_event_acl(17, id));

If PostgreSQL thinks that function will be true for a sufficiently small
proportion of the rows in the table, it will use the index instead of a
sequential scan, which might end up being faster. But the index won't help you
when you want to check values other than 17, e.g.

SELECT count(*) FROM plan_events WHERE f_plan_event_acl(42, id)

Another option might be to precalculate these data, if you have a relatively
small set of values you pass to f_plan_event_acl(). A table somewhere could
store the f_plan_events_acl() argument (17, as well as any other values you
want to precalculate), and a count of plan_events rows where
f_plan_events_acl() returns true with that argument. A set of triggers would
ensure that whenever someone INSERTs, UPDATEs, or DELETEs a plan_events row,
these counts are updated accordingly. Then you can refer to that table when
you need a count. If values in other tables can change the results of
f_plan_events_acl(), you'd need triggers there, too (and this method would
probably start to become unworkably complicated).

As an alternative to the precalculation option, you could also cache the
results of this query somewhere, and presumably invalidate that cache using a
trigger on the plan_events table.

Finally, you can try to improve performance of your function itself. If it's
taking 4.6 sec. to read and process 2669 rows, either you're reading awfully
slowly (VACUUM problems, perhaps?) or your f_plan_events_acl() function takes
a long time to run.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Reinoud van Leeuwen 2010-07-14 13:44:10 Re: Question on COUNT performance
Previous Message Jean-David Beyer 2010-07-14 12:37:06 Re: subtract two dates to get the number of days