Re: Index for Levenshtein distance (better format)

From: Giuseppe Broccolo <giuseppe(dot)broccolo(at)2ndquadrant(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Index for Levenshtein distance (better format)
Date: 2013-07-22 16:47:31
Message-ID: 51ED6223.7080607@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Janek,

Il 21/07/2013 16:46, Janek Sendrowski ha scritto:
> Hi,
>
> Im searching for a suitable Index for my query witch compares Strings with the Levenshtein distance.
>
> I read that a prefix index would fit, but I dont know how to build it. I only know that its supported by Gist.
>
> I couldn't find an instructions so I hope someone can help me.

Consider this simple example: suppose you have a table "table1" with a
column "string" of type TEXT, and you are interested to find "string" in
your table equal to 'ciao' basing your query in Levenshtein distance.
Then, improve your query creating an index based on gist.

First of all, create extensions to use levenshtein() function, and gist
indexing for types different from PostGIS objects:

CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION btree_gist;

You can check that all works fine trying a query like:

SELECT string FROM table1 WHERE levenshtein(string,'ciao') = 0;

which finds also "string" equal to 'ciao'. Create now the index to
improve this query:

CREATE INDEX lev_idx ON table1 USING GIST(levenshtein(string,'ciao'));

And relaunch the same query. I made a simple check with a very simple
table1 example with ten raws using EXPLAIN ANALYSE to exploit the
performances: query time changes from 1.077 ms to 0.677 ms after gist
index creation.

Hope it helps,

Giuseppe.

--
Giuseppe Broccolo - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
giuseppe(dot)broccolo(at)2ndQuadrant(dot)it |www.2ndQuadrant.it

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Janes 2013-07-22 16:48:58 Re: odd locking behaviour
Previous Message Pavel Stehule 2013-07-22 16:37:17 Re: Different transaction log for database/schema