Re: multibyte charater set in levenshtein function

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multibyte charater set in levenshtein function
Date: 2010-07-21 21:59:20
Message-ID: AANLkTikvz=wp7r72Jd8iWrdAegawijzZ6ejm08WXEzpW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 21, 2010 at 2:47 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> On Wed, Jul 21, 2010 at 10:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> *scratches head*  Aren't you just moving the same call to a different
>> place?
>
> So, where you can find this different place? :) In this patch
> null-terminated strings are not used at all.

I can't. You win. :-)

Actually, I wonder if there's enough performance improvement there
that we might think about extracting that part of the patch and apply
it separately. Then we could continue trying to figure out what to do
with the rest. Sometimes it's simpler to deal with one change at a
time.

> I tested it with american-english dictionary with 98569 words.
>
> test=# select sum(levenshtein(word, 'qwerqwerqwer')) from words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 131,435 ms
> test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',100)) from
> words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 221,078 ms
> test=# select sum(levenshtein_less_equal(word, 'qwerqwerqwer',-1)) from
> words;
>    sum
> ---------
>  1074376
> (1 row)
>
> Time: 254,819 ms
>
> The function with negative value of max_d didn't become faster than with
> just big value of max_d.

Ah, I see. That's pretty compelling, I guess. Although it still
seems like a lot of code...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-21 22:11:38 Re: dynamically allocating chunks from shared memory
Previous Message Alvaro Herrera 2010-07-21 21:38:06 Re: git config user.email