similarity and operator '%'

From: Volker Boehm <volker(at)vboehm(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: similarity and operator '%'
Date: 2016-05-30 17:53:59
Message-ID: 574C7E37.7080102@vboehm.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,
I'm trying to find persons in an address database where I have built
trgm-indexes on name, street, zip and city.

When I search for all four parts of the address (name, street, zip and city)

select name, street, zip, city
from addresses
where name % $1
and street % $2
and (zip % $3 or city % $4)

everything works fine: It takes less than a second to get some (5 - 500)
proposed addresses out of 500,000 addresses and the query plan shows

Bitmap Heap Scan on addresses (cost=168.31..1993.38 rows=524 ...
Recheck Cond: ...
-> Bitmap Index Scan on ...
Index Cond: ...

The same happens when I search only by name with

select name, street, zip, city
from addresses
where name % $1

But when I rewrite this query to

select name, street, zip, city
from addresses
where similarity(name, $1) > 0.3

which means exactly then same as the second example, the query plan
changes to

Seq Scan on addresses (cost=0.00..149714.42 rows=174675 width=60)
Filter: ...

and the query lasts about a minute.

The reason for using the similarity function in place of the
'%'-operator is that I want to use different similarity values in one query:

select name, street, zip, city
from addresses
where name % $1
and street % $2
and (zip % $3 or city % $4)
or similarity(name, $1) > 0.8

which means: take all addresses where name, street, zip and city have
little similarity _plus_ all addresses where the name matches very good.

The only way I found, was to create a temporary table from the first
query, change the similarity value with set_limit() and then select the
second query UNION the temporary table.

Is there a more elegant and straight forward way to achieve this result?

regards Volker

--
Volker Böhm Tel.: +49 4141 981155
Voßkuhl 5 mailto:volker(at)vboehm(dot)de
21682 Stade http://www.vboehm.de

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David G. Johnston 2016-05-30 18:20:33 Re: similarity and operator '%'
Previous Message Vladimir Borodin 2016-05-30 17:03:13 Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6