Slow planning time for custom function

From: bk(at)e8s(dot)de
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Slow planning time for custom function
Date: 2018-03-23 20:28:22
Message-ID: 28422CFD-E3CB-4652-868D-0E6EA7CFB589@e8s.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I have a table api.issues that has a text column "body" with long texts (1000+ chars). I also wrote a custom function "normalizeBody" with plv8 that is a simple Text -> Text conversion. Now I created an index applying the function to the body column, so I can quickly run

SELECT * FROM api.issues WHERE normalizeBody(body) = normalizeBody($1)

The issue is, that the planning time is very slow (1.8 seconds). When I replace "normalizeBody" with "md5", however, I get a planning time of 0.5ms.

Please note that row level security is enabled on the api.issues and most other tables.

Thanks for your help,
Ben

Details below:
- Managed AWS Postgres with default settings, no replication
- PostgreSQL 10.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
- Table api.issues has approx. 40 000 rows.

```
explain (analyze, buffers) select 1 from api.issues
where normalizeunidiff(body) = normalizeunidiff('');

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using rejectedissues_normalized on issues (cost=0.00..218.80 rows=217 width=4) (actual time=0.160..0.204 rows=3 loops=1)
Index Cond: (normalizeunidiff(body) = ''::text)
Buffers: shared hit=5
Planning time: 1878.783 ms
Execution time: 0.230 ms
(5 rows)
```

```
explain (analyze, buffers) select 1 from api.issues
where md5(body) = md5('');

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using rejectedissues_md5 on issues (cost=0.00..218.80 rows=217 width=4) (actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (md5(body) = 'd41d8cd98f00b204e9800998ecf8427e'::text)
Buffers: shared hit=2
Planning time: 0.565 ms
Execution time: 0.043 ms
(5 rows)
```

```
CREATE OR REPLACE FUNCTION public.normalizeunidiff(
unidiff text)
RETURNS text
LANGUAGE 'plv8'

COST 100
IMMUTABLE STRICT PARALLEL SAFE
AS $BODY$

return unidiff
.replace(/[\s\S]*@@/m, '') // remove header
.split('\n')
.map(function (line) { return line.trim() })
.filter(function (line) { return line.search(/^[+-]/) >= 0 })
.join('\n')
.trim()

$BODY$;
```

The indices are created this way where md5 is normalizeunidiff for the second one:
```
CREATE INDEX "rejectedissues_md5"
ON api.issues using hash
(md5(body));
```

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Peter 2018-03-24 01:27:47 functions: VOLATILE performs better than STABLE
Previous Message Peter 2018-03-23 17:36:58 Re: Should from_collapse be switched off? (queries 10 times faster)