Re: Query much slower when run from postgres function

From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query much slower when run from postgres function
Date: 2009-03-09 19:13:01
Message-ID: 49B56A3D.8000100@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc pgsql-performance

Tom Lane wrote:
> Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr> writes:
>> Now I'm confused, why is 'sql' function much slower than 'direct' SELECT?
>
> Usually the reason for this is that the planner chooses a different plan
> when it has knowledge of the particular value you are searching for than
> when it does not. I suppose 'service_id' has a very skewed distribution
> and you are looking for an uncommon value?

I don't think so. Here is distribution for the messages_memo_displayed
table (joined with messages, just to show how many messages of each
service_id are there in messages_memo_displayed):

pulitzer2=# select service_id, count(*) from messages join
messages_memo_displayed on id = message_id group by service_id order by
service_id;
service_id | count
------------+-------
504 | 2
1790 | 1922
1814 | 1
1816 | 57
1818 | 3
(5 rows)

And the sizes of other tables involved:

pulitzer2=# select count(*) from messages_memo_displayed;
count
-------
1985
(1 row)

Time: 0.602 ms
pulitzer2=#

pulitzer2=# select count(*) from messages;
count
---------
1096388
(1 row)

Time: 345.267 ms
pulitzer2=# select count(*) from messages_memo;
count
--------
776238
(1 row)

Time: 133.942 ms
pulitzer2=#

As I've mentioned earlier, I have created an view, for the sake of this
posting:

CREATE OR REPLACE VIEW _v1 AS
SELECT messages.id, messages."from", messages."to",
messages.receiving_time, messages.raw_text, messages.keyword,
messages.destination_id, messages.vpn_id, messages.service_id,
messages.status, messages.gateway_message_id, messages.prize_id,
messages.tan, messages_memo.memo, messages_memo.state,
messages_memo.displayed, messages_memo_displayed.admin_id
FROM messages
JOIN messages_memo ON messages.id = messages_memo.message_id
LEFT JOIN messages_memo_displayed ON messages.id =
messages_memo_displayed.message_id
WHERE messages_memo.state::integer = 1 AND
messages_memo_displayed.admin_id IS NULL;

And then I created a function:

CREATE OR REPLACE FUNCTION
__new__get_memo_display_queue_size(a_service_id integer)
RETURNS integer AS
$BODY$
SELECT
COUNT(*)::int4
FROM
_v1
WHERE
service_id = $1
$BODY$
LANGUAGE 'sql' VOLATILE SECURITY DEFINER;

Now, here are the differences:
pulitzer2=# select count(*) from _v1 where service_id = 504;
count
-------
0
(1 row)

Time: 6.101 ms
pulitzer2=# select __new__get_memo_display_queue_size(504);
__new__get_memo_display_queue_size
------------------------------------
0
(1 row)

Time: 322.555 ms
pulitzer2=# select count(*) from _v1 where service_id = 1790;
count
-------
1
(1 row)

Time: 25.203 ms
pulitzer2=# select __new__get_memo_display_queue_size(1790);
__new__get_memo_display_queue_size
------------------------------------
1
(1 row)

Time: 225.763 ms
pulitzer2=# select count(*) from _v1 where service_id = 1814;
count
-------
2
(1 row)

Time: 13.662 ms
pulitzer2=# select __new__get_memo_display_queue_size(1814);
__new__get_memo_display_queue_size
------------------------------------
2
(1 row)

Time: 215.251 ms
pulitzer2=# select count(*) from _v1 where service_id = 1816;
count
-------
1
(1 row)

Time: 10.111 ms
pulitzer2=# select __new__get_memo_display_queue_size(1816);
__new__get_memo_display_queue_size
------------------------------------
1
(1 row)

Time: 220.457 ms
pulitzer2=# select count(*) from _v1 where service_id = 1829;
count
-------
13
(1 row)

Time: 2.023 ms
pulitzer2=# select __new__get_memo_display_queue_size(1829);
__new__get_memo_display_queue_size
------------------------------------
13
(1 row)

Time: 221.956 ms
pulitzer2=#

Is this difference normal? I tend to have the interface between the
database and the application trough functions, and I'd like not to
include 'SELECT COUNT(*)...' in my Java code (at least, if I don't have
to! - esp. because I'm not Java developer on the project).

Then, this is also interesting, I think! I'm telling the planer never to
use sequential scan:

pulitzer2=# set enable_seqscan to false;
SET
Time: 0.150 ms
pulitzer2=# select __new__get_memo_display_queue_size(1829);
__new__get_memo_display_queue_size
------------------------------------
13
(1 row)

Time: 2.412 ms
pulitzer2=# select count(*) from _v1 where service_id = 1829;
count
-------
13
(1 row)

Time: 2.092 ms
pulitzer2=# select __new__get_memo_display_queue_size(1816);
__new__get_memo_display_queue_size
------------------------------------
1
(1 row)

Time: 2.473 ms
pulitzer2=# select count(*) from _v1 where service_id = 1816;
count
-------
1
(1 row)

Time: 2.117 ms
pulitzer2=#

Now the the execution times are almost the same.

So, why this difference in the first place, and, what should I do to
have satisfying results when calling a postgres function?
I could rewrite the function from plain sql to plpgsql, and add 'SET
enable_seqscan TO false' before getting the count, and add 'SET
enable_seqscan TO true' after getting the count, but as I was explained
on pg-jdbc mailinglist that is not the way to go.

And I still don't understand why do I have excellent times when I force
planner not to use sequential scan inside the function, but when
'calling' the query from plain sql (SELECT COUNT(*) FROM _v1 WHERE),
execution time is always around 2-4ms, regardles of the value of
enable_seqscan parametar.

Mike

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Saleem EDAH-TALLY 2009-03-09 19:20:58 getGeneratedKeys
Previous Message Andreas Wenk 2009-03-09 17:42:16 Re: [PERFORM] Query much slower when run from postgres function

Browse pgsql-performance by date

  From Date Subject
Next Message Mario Splivalo 2009-03-09 19:26:05 Re: [PERFORM] Query much slower when run from postgres function
Previous Message Andreas Wenk 2009-03-09 17:42:16 Re: [PERFORM] Query much slower when run from postgres function