Re: like/ilike improvements

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: like/ilike improvements
Date: 2007-09-19 18:39:29
Message-ID: 46F16CE1.7040409@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Guillaume Smet wrote:
> Andrew, All,
>
>
>> On 5/22/07, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>>> But before I commit this I'd appreciate seeing some more testing, both
>>> for correctness and performance.
>>>
>
> I finally found some time to test this patch on our data. As our
> production database is still using 8.1, I made my tests with 8.1.10
> and 8.3devel. As I had very weird results, I tested also 8.2.5.
>
> The patch seems to work as expected in my locale. I didn't notice
> problems during the tests I made except for the performance problem I
> describe below.
>
> The box is a recent dual core box using CentOS 5. It's a test box
> installed specifically to test PostgreSQL 8.3. Every version is
> compiled with the same compiler. Locale is fr_FR.UTF-8 and database is
> UTF-8 too.
> The table used to make the tests fits entirely in RAM.
>
> I tested a simple ILIKE query on our data with 8.3devel and it was far
> slower than with 8.1.10 (2 times slower). It was obviously not the
> expected result as it should have been faster considering your work.
> So I decided to test also with 8.2.5 and it seems a performance
> regression was introduced in 8.2 (and not in 8.3 which is in fact a
> bit faster than 8.2).
>
> I saw this item in 8.2 release notes:
> Allow ILIKE to work for multi-byte encodings (Tom)
> Internally, ILIKE now calls lower() and then uses LIKE.
> Locale-specific regular expression patterns still do not work in these
> encodings.
>
> Could it be responsible of such a slow down?
>
> I attached the results of my tests. If anyone needs more information,
> I'll be glad to provide them.
>
>
>

Ugh.

It's at least good to see that the LIKE case has some useful speedup in
8.3.

Can you run the same set of tests in a single byte encoding like latin1?

We might have to look at doing on-demand lowering, but in a case like
yours it looks like we'd still end up lowering almost every character
anyway, so I'm not quite sure what to do. Note that the 8.2 change was a
bug fix, so we can't just revert it. Maybe we need to look closely at
the efficiency of lower().

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-09-19 19:02:09 Re: curious regression failures
Previous Message Pedro Belmino 2007-09-19 18:34:32 Debugger

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2007-09-19 19:02:09 Re: curious regression failures
Previous Message Andrew Dunstan 2007-09-19 18:14:40 Re: curious regression failures