Re: Patch: add conversion from pg_wchar to multibyte

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Date: 2012-05-01 21:45:57
Message-ID: CAPpHfduU=dm8hJFWrYUeSG-H6YYPGu1pjQD-CDU9D10_4Bwn_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Erik

On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> Perhaps I'm too early with these tests, but FWIW I reran my earlier test
> program against three
> instances. (the patches compiled fine, and make check was without
> problem).
>
> -- 3 instances:
> HEAD port 6542
> trgm_regex port 6547 HEAD + trgm-regexp patch (22 Nov 2011) [1]
> trgm_regex_wchar2mb port 6549 HEAD + trgm-regexp + wchar2mb patch (23
> Apr 2012) [2]
>

Actually wchar2mb patch doesn't affect behaviour of trgm-regexp. It provide
correct way to do some work of encoding conversion which last published
version of trgm-regexp does internally. So "HEAD + trgm-regexp patch" and
"HEAD + trgm-regexp + wchar2mb patch" should behave similarly.

> [1] http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php
> [2] http://archives.postgresql.org/pgsql-hackers/2012-04/msg01095.php
>
> -- table sizes:
> azjunk4 10^4 rows 1 MB
> azjunk5 10^5 rows 11 MB
> azjunk6 10^6 rows 112 MB
> azjunk7 10^7 rows 1116 MB
>
> for table creation/structure, see:
> [3] http://archives.postgresql.org/pgsql-hackers/2012-01/msg01094.php
>
> Results for three instances with 4 repetitions per instance are attached.
>
> Although the regexes I chose are somewhat arbitrary, it does show some of
> the good, the bad and
> the ugly of the patch(es). (Also: I've limited the tests to a range of
> 'workable' regexps, i.e.
> avoiding unbounded regexps)
>

Thank you for testing!
Such synthetical tests are very valuable for finding corner cases of the
patch, bugs etc.
But also, it would be nice to do some tests on reallife datasets with
reallife regexps in order to see real benefit of this approach of indexing
and do some comparison with other approaches. May be you or somebody else
could obtain such datasets?

Also, I did some optimizations in algorithm. Automaton analysis stage
should become less CPU and memory consuming. I'll publish new version soon.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-05-01 22:02:23 Re: Patch: add conversion from pg_wchar to multibyte
Previous Message Hannu Krosing 2012-05-01 21:29:12 Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?