Quick Links

Re: How to Optimize pg_trgm Performance

From:	"Ivan E(dot) Panchenko" <i(dot)panchenko(at)postgrespro(dot)ru>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Re: How to Optimize pg_trgm Performance
Date:	2018-01-30 22:50:46
Message-ID:	bb31c511-5518-b405-f083-27e394358ad8@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi Igal,

29.01.2018 02:42, Igal @ Lucee.org пишет:
>
> I want to use pg_trgm for auto-suggest functionality. I created a
> Materialized View with the information that I need, with the relevant
> columns being (keywords text, rank int). keywords is the column from
> which I build the tri-grams, and rank is some popularity factor so
> that popular results will show up higher than less popular results
> given the same tri-gram distance.
>
> I want to return results in the order of [distance], [distance_word],
> [rank]. The input comes from the user and is not known in advance.
> My query is as follows:
>
>     SELECT title
>         ,id
>         ,(input <-> keywords) AS distance
>         ,(input <<-> keywords) AS distance_word
>         ,rank
>     FROM (VALUES (cast('red pill' AS text))) consts(input)
>         ,mv_autosuggest
>     ORDER BY 3, 4, 5
>     LIMIT 20;
>
> This gives me pretty good results, but it takes too long and is not
> likely to scale well.
>
> I have created two indexes but neither seem to be used:
>
> CREATE INDEX mv_autosuggest_keywords_tgrm_gist ON
> staging.mv_autosuggest USING gist (keywords gist_trgm_ops);
>
> CREATE INDEX mv_autosuggest_keywords_tgrm_gin ON
> staging.mv_autosuggest USING gin (keywords gin_trgm_ops);
>
> This is the result of explain analyze:
>
> QUERY PLAN |
> -------------------------------------------------------------------------------------------------------------------------------------------|
> Limit (cost=356.41..356.46 rows=20 width=51) (actual
> time=163.132..163.135 rows=20
> loops=1)                                               |
> -> Sort (cost=356.41..372.96 rows=6619 width=51) (actual
> time=163.130..163.131 rows=20
> loops=1)                                        |
>         Sort Key: (('red pill'::text <-> mv_autosuggest.keywords)),
> (('red pill'::text <<-> mv_autosuggest.keywords)), mv_autosuggest.rank |
>         Sort Method: top-N heapsort Memory: 28kB |
>         -> Seq Scan on mv_autosuggest (cost=0.00..180.29 rows=6619
> width=51) (actual time=0.263..161.289 rows=6619 loops=1)              |
> Planning time: 0.139 ms |
> Execution time: 163.174 ms |
>
> How can I improve the performance here?
>
This can be improved if you use sort only by distances
(try ORDER BY 3,4
or ORDER BY 3
or ORDER BY 4

Than you should get plan like

Index Scan using mv_autosuggest_keywords_tgrm_gist on mv_autosuggest
Order By: ((keywords <-> 'red pill'::text) AND (keywords <->>
'red pill'::text))

Which means that KNN index search is enabled :
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2010-1.pdf

If you want to sort it also by rank, you can make a two-level
construction like:

SELECT * FROM (
SELECT .... ORDER BY 3,4 LIMIT .... /* make some empirical
redundant limit here */
) foo ORDER BY 5 LIMIT ....

Regards,
Ivan Panchenko

> Thank you,
>
> Igal Sapir
> Lucee Core Developer
> Lucee.org <http://lucee.org/>
>

In response to

How to Optimize pg_trgm Performance at 2018-01-28 23:42:17 from Igal @ Lucee.org

Browse pgsql-general by date

	From	Date	Subject
Next Message	Peter Eisentraut	2018-01-31 04:15:15	Re: pg10 logical replication set schema
Previous Message	David G. Johnston	2018-01-30 22:14:33	Re: Working with JSONB data having node lists