Re: POSIX regex performance bug in 7.3 Vs. 7.2

From: wade <wade(at)wavefire(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: POSIX regex performance bug in 7.3 Vs. 7.2
Date: 2003-02-04 16:24:47
Message-ID: 3.0.32.20030204082447.020a2aa0@mail.wavefire.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK,
I redid my trials with the same data set on 7.2.3 --with-multibyte and I
get the same brutal performance hit, so it is definitely a
multibyte-specific problem.
WRT the distribution of the data in the table, I used the following:
All g-words in /usr/share/dict with different processes attached:
no process
init caps.
word || row_id
etc...

There are only about 1000 words that appear more than once (2 or 3 times)
in 27k rows.
-Wade Klaver

At 11:08 PM 2/3/03 -0500, Tom Lane wrote:
>Next question: may I guess that you weren't using MULTIBYTE in 7.2?
>
>After still more digging, I'm coming round to the opinion that the
>problem is that MULTIBYTE is forced on in 7.3, and this imposes a
>factor-of-256 overhead in a bunch of the operations in regcomp.c.
>In particular, compiling a case-independent regex is now hugely
>more expensive than it used to be.
>
>The parties who wanted to force MULTIBYTE on promised that there
>would be no such penalties :-(
>
> regards, tom lane
>
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2003-02-04 16:46:13 Re: POSIX regex performance bug in 7.3 Vs. 7.2
Previous Message Damjan Pipan 2003-02-04 16:00:16 [GENERAL] HELP NEEDED: Recreating DROP columns