From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Supporting = operator in gin/gist_trgm_ops |
Date: | 2020-10-26 17:50:05 |
Message-ID: | CAPpHfduP3JPH-iVUTJRvPTP2mY4EPKEz4iJBcRHr7Z+RCCxFKQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> Ah, yes this might lead to bad performance if the "fake wildcard"
> matches too many rows, but this shouldn't be a very common use case,
> and the only alternative for that might be to create trigrams for non
> alphanumerics characters. I didn't try to do that because it would
> mean meaningful overhead for mainstream usage of pg_trgm, and would
> also mean on-disk format break. In my opinion supporting = should be
> a best effort, especially for such corner cases.
It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm(). It some
cases it would generate more trigrams. For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.
Also I wonder how our costing would work if there are multiple indices
of the same column. We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this. But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Anastasia Lubennikova | 2020-10-26 18:09:17 | Commitfest 2020-11 |
Previous Message | Tom Lane | 2020-10-26 16:56:40 | Re: PATCH: Report libpq version and configuration |