domain cast in parameterized vs. non-parameterized query

From: David Kamholz <lautgesetz(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: domain cast in parameterized vs. non-parameterized query
Date: 2017-12-20 06:03:38
Message-ID: CAKuxgJ51c3k4JsB2uK8tS5_sWfSvTXY8tHOkwJwD23HZtTb1Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've recently come across a query that produces different plans depending
on whether it's parameterized or not. The parameterized query takes ~50ms
while the non-parameterized query takes ~4s. The issue seems to be that the
query contains a STABLE function (uid_langvar) whose parameter is a domain
over text (uid).

The parameterized query is able to use the return value of uid_langvar to
choose a better plan:

2017-12-19 23:13:21 GMT LOG: duration: 0.063 ms plan:
Query Text:
SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid

Index Scan using langvar_uid_idx on langvar (cost=0.29..2.59
rows=1 width=4) (actual time=0.058..0.058 rows=1 loops=1)
Index Cond: ((((((lang_code)::text || '-'::text) ||
lpad((var_code)::text, 3, '0'::text)))::uid)::text = ($1)::text)
2017-12-19 23:13:21 GMT CONTEXT: SQL function "uid_langvar" statement 1
2017-12-19 23:13:21 GMT LOG: duration: 150.634 ms plan:
Query Text: select expr.id, expr.langvar, expr.txt, expr.txt_degr,
exprx.score as expr_score
from expr
inner join exprx on expr.id = exprx.id
where expr.langvar = uid_langvar($1)
order by exprx.score desc
limit 10

Limit (cost=0.88..426.76 rows=10 width=45) (actual
time=133.378..150.611 rows=10 loops=1)
-> Nested Loop (cost=0.88..23706962.21 rows=556656 width=45)
(actual time=133.376..150.598 rows=10 loops=1)
-> Index Scan Backward using exprx_score_langvar_idx on
exprx (cost=0.44..2973934.39 rows=25583602 width=8) (actual
time=0.052..13.479 rows=5589 loops=1)
-> Index Scan using expr_pkey on expr (cost=0.44..0.81
rows=1 width=41) (actual time=0.023..0.023 rows=0 loops=5589)
Index Cond: (id = exprx.id)
Filter: (langvar = uid_langvar('spa-000'::uid))
Rows Removed by Filter: 1

Note that "SELECT id FROM langvar..." is the body of the uid_langvar
function. Also note that in the filter condition, 'spa-000' is cast
directly to uid. However, the non-parameterized query, where 'spa-000' is
passed directly, produces a different plan:

2017-12-19 23:18:01 GMT LOG: duration: 0.066 ms plan:
Query Text:
SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid

Index Scan using langvar_uid_idx on langvar (cost=0.29..2.59
rows=1 width=4) (actual time=0.062..0.062 rows=1 loops=1)
Index Cond: ((((((lang_code)::text || '-'::text) ||
lpad((var_code)::text, 3, '0'::text)))::uid)::text = ($1)::text)
2017-12-19 23:18:01 GMT CONTEXT: SQL function "uid_langvar" statement 1
2017-12-19 23:18:05 GMT LOG: duration: 3950.817 ms plan:
Query Text: select expr.id, expr.langvar, expr.txt, expr.txt_degr,
exprx.score as expr_score
from expr
inner join exprx on expr.id = exprx.id
where expr.langvar = uid_langvar('spa-000')
order by exprx.score desc
limit 10
;
Limit (cost=12842.48..12842.51 rows=10 width=45) (actual
time=3950.777..3950.793 rows=10 loops=1)
-> Sort (cost=12842.48..12850.44 rows=3182 width=45) (actual
time=3950.775..3950.780 rows=10 loops=1)
Sort Key: exprx.score DESC
Sort Method: top-N heapsort Memory: 26kB
-> Nested Loop (cost=1.13..12773.72 rows=3182 width=45)
(actual time=1.524..3541.873 rows=561076 loops=1)
-> Index Scan using expr_langvar_id_idx on expr
(cost=0.69..3823.68 rows=3183 width=41) (actual time=1.480..717.547
rows=561293 loops=1)
Index Cond: (langvar =
uid_langvar(('spa-000'::text)::uid))
-> Index Scan using exprx_id_idx on exprx
(cost=0.44..2.71 rows=1 width=8) (actual time=0.002..0.003 rows=1
loops=561293)
Index Cond: (id = expr.id)

Note that in the above plan, 'spa-000' is cast to text before it's cast to
uid. This is apparently connected to why postgresql can't choose the better
plan.

This difference between plans of parameterized and non-parameterized
queries seems strange to me. Is it actually expected/correct or is it a bug?

Here's the definition of the domain and the functions, in case it's
relevant:

CREATE DOMAIN uid AS text
CONSTRAINT uid_check CHECK ((VALUE ~ '^[a-z]{3}-\d{3}$'::text));

CREATE FUNCTION uid_langvar(uid uid) RETURNS integer
LANGUAGE sql STABLE PARALLEL SAFE
AS $$
SELECT id FROM langvar WHERE uid(lang_code, var_code) = uid
$$;

CREATE FUNCTION uid(lang_code alpha3, var_code smallint) RETURNS uid
LANGUAGE sql IMMUTABLE PARALLEL SAFE
AS $$select (lang_code || '-' || lpad(var_code::text, 3, '0'))::uid;$$;

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-20 06:20:57 Re: [HACKERS] Parallel Hash take II
Previous Message Ildar Musin 2017-12-20 05:58:18 General purpose hashing func in pgbench