Skip site navigation (1) Skip section navigation (2)

Re: Patch: add conversion from pg_wchar to multibyte

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Date: 2012-05-01 21:45:57
Message-ID: CAPpHfduU=dm8hJFWrYUeSG-H6YYPGu1pjQD-CDU9D10_4Bwn_w@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi Erik


On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> Perhaps I'm too early with these tests, but FWIW I reran my earlier test
> program against three
> instances.  (the patches compiled fine, and make check was without
> problem).
>
> -- 3 instances:
> HEAD                 port 6542
> trgm_regex           port 6547  HEAD + trgm-regexp patch (22 Nov 2011) [1]
> trgm_regex_wchar2mb  port 6549  HEAD + trgm-regexp + wchar2mb patch (23
> Apr 2012) [2]
>

Actually wchar2mb patch doesn't affect behaviour of trgm-regexp. It provide
correct way to do some work of encoding conversion which last published
version of trgm-regexp does internally. So "HEAD + trgm-regexp patch" and
"HEAD + trgm-regexp + wchar2mb patch" should behave similarly.


> [1] http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php
> [2] http://archives.postgresql.org/pgsql-hackers/2012-04/msg01095.php
>
> -- table sizes:
>  azjunk4  10^4 rows     1 MB
>  azjunk5  10^5 rows    11 MB
>  azjunk6  10^6 rows   112 MB
>  azjunk7  10^7 rows  1116 MB
>
> for table creation/structure, see:
> [3] http://archives.postgresql.org/pgsql-hackers/2012-01/msg01094.php
>
> Results for three instances with 4 repetitions per instance are attached.
>
> Although the regexes I chose are somewhat arbitrary, it does show some of
> the good, the bad and
> the ugly of the patch(es).  (Also: I've limited the tests to a range of
> 'workable' regexps, i.e.
> avoiding unbounded regexps)
>

Thank you for testing!
Such synthetical tests are very valuable for finding corner cases of the
patch, bugs etc.
But also, it would be nice to do some tests on reallife datasets with
reallife regexps in order to see real benefit of this approach of indexing
and do some comparison with other approaches. May be you or somebody else
could obtain such datasets?

Also, I did some optimizations in algorithm. Automaton analysis stage
should become less CPU and memory consuming. I'll publish new version soon.

------
With best regards,
Alexander Korotkov.

In response to

pgsql-hackers by date

Next:From: Alexander KorotkovDate: 2012-05-01 22:02:23
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Previous:From: Hannu KrosingDate: 2012-05-01 21:29:12
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group