Re: Question on COUNT performance

From: Lee Hachadoorian <lee(dot)hachadoorian(at)gmail(dot)com>
To: Anders Østergaard Jensen <aj(at)itersys(dot)dk>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question on COUNT performance
Date: 2010-07-14 16:21:26
Message-ID: AANLkTikWZdVzUQlOLRnPvqI0aJ9pNdUtUfeSLiXGIjfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

The first statement of the function

> : select into user *
> from users where id = uid;

appears to be a useless drag, as I don't see the user record referred
to anywhere else in the function. There appears to be other
unnecessary statements. For example :

> : select into pcount_rel
> COUNT(*) as acl_count FROM project_plan_events ppe WHERE ppe.plan_event_id =
> plan_id;
>
> : if
> (pcount_rel.acl_count > 0) then
>
>
>
> : SELECT INTO
> project * FROM projects WHERE id IN (SELECT project_id FROM
> project_plan_events ppe2 WHERE ppe2.plan_event_id = plan_id) OFFSET 0 LIMIT
> 1;
>
> : return
> f_project_acl(uid, project.id);
>
>
> : end if;
>

This appears to check whether the plan_id exists in a link table, find
an associated project_id, and run some function on project_id.

This could instead be done as:

FOR project_record IN SELECT project_id FROM project_plan_events WHERE
plan_event_id = plan_id LIMIT 1 LOOP
return f_project_acl(uid, project_record.project_id)
END LOOP;

If plan_id doesn't exist in project_plan_events, the LOOP just gets skipped.

The same could be done for the next IF block in they query which
checks to see whether plan_id has a matching customer_id in a link
table.

Note that your LIMIT 1 (which I have retained) strongly implies a
1-to-1 relationship between project_id and plan_id. If not, this
function gets applied to an arbitrary project_id from among all
matching project_ids. (Same goes for customer_id.)

Assuming f_project_acl and f_customer_acl return TRUE if successful,
the whole thing (from the original SELECT COUNT(*) looks like it can
be summarized as:

Call a function with a plan_id
If a matching project_id exists
Do some function on the project_id
count +1
Else If a matching customer_id exists
Do some function on the customer_id
count +1
Else
count +1

Return count, which, since the function gets called once for each row
in plan_events, count should always equal the number of rows in plan
events.

I would be inclined to replace the whole thing with something like this:

SELECT newfunc(uid);

CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
DECLARE
plan_record record;
i int := 0;
BEGIN
FOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
JOIN customer_plan_events USING (plan_id) LOOP
IF plan_record.project_id IS NOT NULL THEN
PERFORM f_project_acl(uid, plan_record.project_id);
ELSEIF plan_record.customer_id IS NOT NULL THEN
PERFORM f_customer_acl(uid, plan_record.customer_id);
END IF;
i := i + 1;
END LOOP ;
RETURN i;
END;
$$ LANGUAGE plpgsql;

If I understand what's going on in your function, I *think* this would
reduce 9000-12,000 SELECT statements to 1 SELECT statement.

Obviously, not tested. Hope this is helpful.

--Lee

--
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Lee Hachadoorian 2010-07-14 16:49:14 Re: Question on COUNT performance
Previous Message Thomas Kellerer 2010-07-14 16:06:22 Re: subtract two dates to get the number of days