Re: Supporting = operator in gin/gist_trgm_ops

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Supporting = operator in gin/gist_trgm_ops
Date: 2020-10-26 17:50:05
Message-ID: CAPpHfduP3JPH-iVUTJRvPTP2mY4EPKEz4iJBcRHr7Z+RCCxFKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> Ah, yes this might lead to bad performance if the "fake wildcard"
> matches too many rows, but this shouldn't be a very common use case,
> and the only alternative for that might be to create trigrams for non
> alphanumerics characters. I didn't try to do that because it would
> mean meaningful overhead for mainstream usage of pg_trgm, and would
> also mean on-disk format break. In my opinion supporting = should be
> a best effort, especially for such corner cases.

It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm(). It some
cases it would generate more trigrams. For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.

Also I wonder how our costing would work if there are multiple indices
of the same column. We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this. But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-10-26 18:09:17 Commitfest 2020-11
Previous Message Tom Lane 2020-10-26 16:56:40 Re: PATCH: Report libpq version and configuration