From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Janek Sendrowski <janek12(at)web(dot)de>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Fastest Index/Algorithm to find similar sentences |
Date: | 2013-08-20 23:18:09 |
Message-ID: | CAHyXU0zKSRpFVTd3x9uKNf-nK-Dr96+Ot=7_0TiR47_-q0oTRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 2, 2013 at 10:25 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> Janek Sendrowski <janek12(at)web(dot)de> wrote:
>
>> I also tried pg_trgm module, which works with tri-grams, but it's
>> also very slow with 100.000+ rows.
>
> Hmm. I found the pg_trgm module very fast for name searches with
> millions of rows *as long as I used KNN-GiST techniques*. Were you
> careful to do so? Check out the "Index Support" section of this
> page:
>
> http://www.postgresql.org/docs/current/static/pgtrgm.html
>
> While I have not tested this technique with a column containing
> sentences, I would expect it to work well. As a quick
> confirmation, I imported the text form of War and Peace into a
> table, with one row per *line* (because that was easier than
> parsing sentence boundaries for a quick test). That was over
> 65,000 rows.
+ 1 this. pg_trgm is black magic. search time (when using index) is
mostly dependent on number of trigrams in search string vs average
number of trigrams in database.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Moshe Jacobson | 2013-08-20 23:34:15 | Re: pg_extension_config_dump() with a sequence |
Previous Message | andres.pascal | 2013-08-20 23:06:08 | Re: Fastest Index/Algorithm to find similar sentences |