Re: [v9.2] make_greater_string() does not return a string in some cases

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-22 11:30:29
Message-ID: 20110922.203029.185577421.horiguchi.kyotaro@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Thank you for your understanding on that point.

At Wed, 21 Sep 2011 20:35:02 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote
> ...while Kyotaro Horiguchi clearly feels otherwise, citing the
> statistic that about 100 out of 7000 Japanese characters fail to work
> properly:
>
> http://archives.postgresql.org/pgsql-bugs/2011-07/msg00064.php
>
> That statistic seems to justify some action, but what? Ideas:

Addition to the figures - based on whole characters defined in
JIS X 0208 which is traditionally (It is becoming a history now.)
for information exchange in Japan - narrowing to commonly-used
characters (named `Jouyou-Kanji' in Japanese, to be learned by
high school graduates in Japan), 35 out of 2100 hits.

# On the other hand, widening to JIS X 0213 which is roughly
# compatible with the Unicode, and defines more than 12K chars, I
# have not counted, but the additional 5k characters can be
# assumed to have less probability to fail than the chars in JIS
# X 0208.

> 1. Adopt the patch as proposed, or something like it.
> 2. Instead of installing encoding-specific character incrementing
> functions, we could try to come up with a more reliable generic
> algorithm. Not sure exactly what, though.
> 3. Come up with some way to avoid needing to do this in the first place.
>
> One random idea I have is - instead of generating > and < clauses,
> could we define a "prefix match" operator - i.e. a ### b iff substr(a,
> 1, length(b)) = b? We'd need to do something about the selectivity,
> but I don't see why that would be a problem.
>
> Thoughts?

I am a newbie for PostgreSQL, but from a general view, I think
that the most radical and clean way to fix this behavior is to
make indexes to have the forward-matching function for strings in
itself, with ignoreing possible overheads I don't know. This can
save the all failures this patch has left unsaved, assuming that
the `greater string' is not necessary to be a `valid string' just
on searching btree.

Another idea that I can guess is to add a new operator that means
"examine if the string value is smaller than the `greater string'
of the parameter.". This operator also can defer making `greater
string' to just before searching btree or summing up histogram
entries, or comparison with column values. If the assumption
above is true, "making greater string" operation can be done in
regardless of character encoding. This seems have smaller impact
than "prefix match" operator.

# But, mmm, The more investigating, the less difference it seems
# for me to be... But It is out of my knowledge now, anyway.. I
# need more study.

On the other hand, if no additional encoding-specific `character
increment function' will not come out, the modification of
pg_wchar_table can be cancelled and make_greater_string will
select the `character increment function' as 'switch
(GetDatabaseEncoding()) { case PG_UTF8:.. }'. This get rid of
the pg_generic_charinc tweak for libpq too.

At Wed, 21 Sep 2011 21:49:27 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
> detail work; for instance, I noted an unconstrained memcpy into a 4-byte
> local buffer, as well as lots and lots of violations of PG house style.
> That's certainly all fixable but somebody will have to go through it.

Sorry for the illegal style of the patch. I will confirm it.

Regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2011-09-22 12:49:57 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Tom Lane 2011-09-22 05:42:54 Re: Timezone issues with Postrres

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2011-09-22 11:31:32 Re: Double sorting split patch
Previous Message Heikki Linnakangas 2011-09-22 11:22:36 Re: Double sorting split patch