Quick Links

Re: similarity and operator '%'

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Volker Boehm <volker(at)vboehm(dot)de>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: similarity and operator '%'
Date:	2016-05-30 20:05:41
Message-ID:	CAMkU=1wtKJpkjBoL7ubjbZS=rOMAsNKum-BXZUQkpW70gntzSQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Mon, May 30, 2016 at 10:53 AM, Volker Boehm <volker(at)vboehm(dot)de> wrote:

> The reason for using the similarity function in place of the '%'-operator is
> that I want to use different similarity values in one query:
>
> select name, street, zip, city
> from addresses
> where name % $1
> and street % $2
> and (zip % $3 or city % $4)
> or similarity(name, $1) > 0.8

I think the best you can do through query writing is to use the
most-lenient setting in all places, and then refilter to get the less
lenient cutoff:

select name, street, zip, city
from addresses
where name % $1
and street % $2
and (zip % $3 or city % $4)
or (name % $1 and similarity(name, $1) > 0.8)

If it were really important to me to get maximum performance, what I
would do is alter/fork the pg_trgm extension so that it had another
operator, say %%%, with a hard-coded cutoff which paid no attention to
the set_limit(). I'm not really sure how the planner would deal with
that, though.

Cheers,

Jeff

In response to

similarity and operator '%' at 2016-05-30 17:53:59 from Volker Boehm

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Vladimir Borodin	2016-05-31 09:06:03	Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Previous Message	Jeff Janes	2016-05-30 19:34:35	Re: Re: Planner chooses slow index heap scan despite accurate row estimates