Re: Win32 unicode vs ICU

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org, "Palle Girgensohn" <girgen(at)pingpong(dot)net>
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-23 13:48:25
Message-ID: 1890.1124804905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
>> There is a strxfrm() call in
>> src/backend/utils/adt/selfuncs.c, which probably needs to be
>> looked at too.

> Ok. Will look into that. Do you have a hint as to how to test that?

Any problems would manifest as a bogus interpolation between histogram
elements for a scalar-inequality selectivity estimate in a text column.
For instance, if you insert all 676 2-letter combinations AA, AB, AC,
..., ZY, ZZ into a text column, ANALYZE, and then try cases like
"EXPLAIN SELECT * FROM tab WHERE col < 'QW'", ideally the row estimate
should be pretty nearly dead on. Being pure-ASCII this test would
probably still work in a broken Unicode context, but if you did a
similar experiment with 26 non-ASCII characters it would be likely to
come out with silly results. You could increase the obviousness of the
bad result by reducing the statistics target, since the silliness will
be bounded by the histogram bin size.

(Just looking at it again, the code in convert_string_to_scalar is
pretty bogus for multibyte encodings in any case. Possibly we need to
rethink the whole approach.)

> Which brings up another point - there are clearly no regression tests
> for this (considering we missed the unicode stuff early in the 8.0
> cycle).

src/test/locale? src/test/mb? I've never used either, but they're
there ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-08-23 14:51:29 Re: 8.1 release notes
Previous Message Bruce Momjian 2005-08-23 13:16:49 Re: 8.1 release notes

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2005-08-23 16:03:44 Re: Win32 unicode vs ICU
Previous Message Tom Lane 2005-08-23 13:26:16 Re: win32 random number generator