Hi, my database has UTF8 encoding and Finnish locale, the client_encoding and the console is set to WIN1252. I created a table with a single NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT to_char(money, '999D99L') FROM table' through psql gives the following error message: ERROR: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". The graphical Query tool returns a set of empty rows. The query works ok without the 'L'. Thanks in advance, Mikko
Mikko wrote: > my database has UTF8 encoding and Finnish locale, the client_encoding > and the console is set to WIN1252. I created a table with a single > NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT > to_char(money, '999D99L') FROM table' through psql gives the following > error message: > > ERROR: invalid byte sequence for encoding "UTF8": 0x80 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". > > The graphical Query tool returns a set of empty rows. The query works > ok without the 'L'. That is strange. What is your psql version? What is the output of the following commands: SHOW server_version; SHOW server_encoding; SHOW client_encoding; SHOW lc_numeric; SHOW lc_monetary; SELECT to_char(3.1415::numeric(5,2), '999D99L'); Yours, Laurenz Albe
psql (PostgreSQL) 8.3.7 server_version 8.3.7 server_encoding UTF8 client_encoding win1252 lc_numeric Finnish, Finland lc_monetary Finnish, Finland testdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L'); ERROR: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". If connected to postgres database the query returns 3,14. Mikko
Mikko escribió: > psql (PostgreSQL) 8.3.7 > > server_version 8.3.7 > server_encoding UTF8 > client_encoding win1252 > lc_numeric Finnish, Finland > lc_monetary Finnish, Finland > > testdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L'); > > ERROR: invalid byte sequence for encoding "UTF8": 0x80 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". FWIW 0x80 is the Euro symbol in Win1252 according to http://en.wikipedia.org/wiki/Windows-1252 Maybe the problem here is that the chosen locales are not UTF8. Does it work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001" instead? Those should match the server_encoding. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Maybe the problem here is that the chosen locales are not UTF8. Does it
> work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
> instead? Those should match the server_encoding.
alter database testdb set lc_monetary(or numeric) to
'Finnish_Finland.65001' returns:
ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"
However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.
Mikko
Mikko escribió:
> On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Maybe the problem here is that the chosen locales are not UTF8. Does it
> > work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
> > instead? Those should match the server_encoding.
>
> alter database testdb set lc_monetary(or numeric) to
> 'Finnish_Finland.65001' returns:
> ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"
Ouch ... I thought that was the way that Windows designated UTF8
locales, but maybe I am wrong.
> However, I noticed that both lc_collate and lc_ctype are set to
> Finnish_Finland.1252 by the installer. Should I have just run initdb
> with --locale fi_FI.UTF8 at the very start? The to_char('L') works
> fine with a database with win1252 encoding.
Hmm, it should have disallowed the creation of an UTF8 database then.
Maybe that part is what is broken here.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote: > Ouch ... I thought that was the way that Windows designated UTF8 > locales, but maybe I am wrong. Ok, now I found out that Windows doesn't support locales with encoding using more than two bytes per character and initdb falls back to 1252. http://msdn.microsoft.com/en-us/library/x99tb11d.aspx I guess I'll have to manage with win1252 encoded dbs for the moment. Thanks for the answers! Mikko
Mikko escribió: > On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera > <alvherre(at)commandprompt(dot)com> wrote: > > Ouch ... I thought that was the way that Windows designated UTF8 > > locales, but maybe I am wrong. > > Ok, now I found out that Windows doesn't support locales with encoding > using more than two bytes per character and initdb falls back to 1252. > > http://msdn.microsoft.com/en-us/library/x99tb11d.aspx Hmm. Does this imply that we shouldn't allow UTF8 database on Windows at all? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes: > Does this imply that we shouldn't allow UTF8 database on Windows at all? That would be pretty unfortunate :-( I think what this suggests is that there probably needs to be some encoding conversion logic near the places we examine localeconv() output. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes: >> Does this imply that we shouldn't allow UTF8 database on Windows at all? > > That would be pretty unfortunate :-( > > I think what this suggests is that there probably needs to be some > encoding conversion logic near the places we examine localeconv() > output. Attached is a patch to the current CVS. It uses a similar way like LC_TIME stuff does. regards, Hiroshi Inoue
Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes: > Tom Lane wrote: >> I think what this suggests is that there probably needs to be some >> encoding conversion logic near the places we examine localeconv() >> output. > Attached is a patch to the current CVS. > It uses a similar way like LC_TIME stuff does. I'm not really in a position to test/commit this, since I don't have a Windows machine. However, since no one else is stepping up to deal with it, here's a quick review: * This seems to be assuming that the user has set LC_MONETARY and LC_NUMERIC the same. What if they're different? * What if the selected locale corresponds to Unicode (ie UTF16) encoding? * #define'ing strdup() to do something rather different from strdup seems pretty horrid from the standpoint of code readability and maintainability, especially with nary a comment explaining it. * Code will dump core on malloc failure. * Since this code is surely not performance critical, I wouldn't bother with trying to optimize it; hence drop the special case for all-ASCII. * Surely we already have a symbol somewhere that can be used in place of this: #define MAX_BYTES_PER_CHARACTER 4 regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes: >> Tom Lane wrote: >>> I think what this suggests is that there probably needs to be some >>> encoding conversion logic near the places we examine localeconv() >>> output. > >> Attached is a patch to the current CVS. >> It uses a similar way like LC_TIME stuff does. > > I'm not really in a position to test/commit this, since I don't have a > Windows machine. However, since no one else is stepping up to deal with > it, here's a quick review: Thanks for the review. I've forgotten the patch because Japanese doesn't have trouble with this issue (the currency symbol is ascii \). If this is really expected to be fixed, I would update the patch according to your suggestion. > * This seems to be assuming that the user has set LC_MONETARY and > LC_NUMERIC the same. What if they're different? Strictky speaking they should be handled individually. > * What if the selected locale corresponds to Unicode (ie UTF16) > encoding? As far as I tested set_locale(LC_MONETARY, xxx.65001) causes an error. > * #define'ing strdup() to do something rather different from strdup > seems pretty horrid from the standpoint of code readability and > maintainability, especially with nary a comment explaining it. Maybe using a function instead of strdup() which calls dbstr_win32() in case of Windows would be better. BTW grouping and money_grouping seem to be out of encoding conversion. Are they guaranteed to be null terminated? > * Code will dump core on malloc failure. I can take care of it. > * Since this code is surely not performance critical, I wouldn't bother > with trying to optimize it; hence drop the special case for all-ASCII. I can take care of it. > > * Surely we already have a symbol somewhere that can be used in > place of this: > #define MAX_BYTES_PER_CHARACTER 4 I can't find it. max(pg_encoding_max_length(encoding), pg_encoding_max_length(PG_UTF8)) may be better. regards, Hiroshi Inoue
Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes: > Tom Lane wrote: >> * This seems to be assuming that the user has set LC_MONETARY and >> LC_NUMERIC the same. What if they're different? > Strictky speaking they should be handled individually. I thought about this some more, and I wonder why you did it like this at all. The patch claimed to be copying the LC_TIME code, but the LC_TIME code isn't trying to temporarily change any locale settings. What we are doing in that code is assuming that the system will give us back the localized strings in the encoding identified by CP_ACP; so all we have to do is convert CP_ACP to wide chars and then to UTF8. Can't we use a similar approach for the output of localeconv? regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes: >> Tom Lane wrote: >>> * This seems to be assuming that the user has set LC_MONETARY and >>> LC_NUMERIC the same. What if they're different? > >> Strictky speaking they should be handled individually. > > I thought about this some more, and I wonder why you did it like this at > all. The patch claimed to be copying the LC_TIME code, but the LC_TIME > code isn't trying to temporarily change any locale settings. LC_TIME and LC_CTYPE (on Windows) settings are changed temporarily in cache_locale_time() in pg_locale.c. > What we > are doing in that code is assuming that the system will give us back > the localized strings in the encoding identified by CP_ACP; AFAIK it's not right. LC_TIME, LC_MONETARY or LC_NUMERIC related output is encoded using LC_CTYPE setting. > so all we > have to do is convert CP_ACP to wide chars and then to UTF8. Can't we > use a similar approach for the output of localeconv? What LC_CTIME code and my patch intend is setting LC_CTYPE to an appropriate value so that related output is converted correctly. If we can set LC_CTYPE to xxx_xxx.65001(UTF8), we can eliminate two steps but it causes an error on Windows. regards, HIroshi Inoue
Where are we on this issue?
---------------------------------------------------------------------------
Hiroshi Inoue wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >> Does this imply that we shouldn't allow UTF8 database on Windows at all?
> >
> > That would be pretty unfortunate :-(
> >
> > I think what this suggests is that there probably needs to be some
> > encoding conversion logic near the places we examine localeconv()
> > output.
>
> Attached is a patch to the current CVS.
> It uses a similar way like LC_TIME stuff does.
>
> regards,
> Hiroshi Inoue
> Index: pg_locale.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
> retrieving revision 1.49
> diff -c -c -r1.49 pg_locale.c
> *** pg_locale.c 1 Apr 2009 09:17:32 -0000 1.49
> --- pg_locale.c 22 Apr 2009 21:08:33 -0000
> ***************
> *** 386,391 ****
> --- 386,449 ----
> free(s->positive_sign);
> }
>
> + #ifdef WIN32
> + #define MAX_BYTES_PER_CHARACTER 4
> + static char *dbstr_win32(bool matchenc, const char *str)
> + {
> + int encoding = GetDatabaseEncoding();
> + bool is_ascii = true;
> + size_t len, ilen, wclen, dstlen;
> + wchar_t *wbuf;
> + char *dst, *ibuf;
> +
> + if (matchenc)
> + return strdup(str);
> + /* Is the str an ascii string ? */
> + for (ibuf = str; *ibuf; ibuf++)
> + {
> + if (!isascii(*ibuf))
> + {
> + is_ascii = false;
> + break;
> + }
> + }
> + /* Simply returns the strdup()ed ascii string */
> + if (is_ascii)
> + return strdup(str);
> +
> + ilen = strlen(str) + 1;
> + wclen = ilen * sizeof(wchar_t);
> + wbuf = (wchar_t *) palloc(wclen);
> + len = mbstowcs(wbuf, str, ilen);
> + if (len == -1)
> + elog(ERROR,
> + "could not convert string to Wide characters:error %lu", GetLastError());
> +
> + dstlen = len * MAX_BYTES_PER_CHARACTER + 1;
> + dst = malloc(dstlen);
> +
> + len = WideCharToMultiByte(CP_UTF8, 0, wbuf, len, dst, dstlen, NULL, NULL);
> + pfree(wbuf);
> + if (len == 0)
> + elog(ERROR,
> + "could not convert string to UTF-8:error %lu", GetLastError());
> +
> + dst[len] = '\0';
> + if (encoding != PG_UTF8)
> + {
> + char *convstr = pg_do_encoding_conversion(dst, len, PG_UTF8, encoding);
> + if (dst != convstr)
> + {
> + strlcpy(dst, convstr, dstlen);
> + pfree(convstr);
> + }
> + }
> +
> + return dst;
> + }
> +
> + #define strdup(str) dbstr_win32(is_encoding_match, str)
> + #endif /* WIN32 */
>
> /*
> * Return the POSIX lconv struct (contains number/money formatting
> ***************
> *** 398,403 ****
> --- 456,466 ----
> struct lconv *extlconv;
> char *save_lc_monetary;
> char *save_lc_numeric;
> + #ifdef WIN32
> + char *save_lc_ctype = NULL;
> + bool lc_ctype_change = false, is_encoding_match;
> + #endif /* WIN32 */
> +
>
> /* Did we do it already? */
> if (CurrentLocaleConvValid)
> ***************
> *** 413,418 ****
> --- 476,492 ----
> if (save_lc_numeric)
> save_lc_numeric = pstrdup(save_lc_numeric);
>
> + #ifdef WIN32
> + save_lc_ctype = setlocale(LC_CTYPE, NULL);
> + if (save_lc_ctype && stricmp(locale_monetary, save_lc_ctype) != 0)
> + {
> + lc_ctype_change = true;
> + save_lc_ctype = pstrdup(save_lc_ctype);
> + setlocale(LC_CTYPE, locale_monetary);
> + }
> + is_encoding_match = (pg_get_encoding_from_locale(locale_monetary) == GetDatabaseEncoding());
> + #endif
> +
> setlocale(LC_MONETARY, locale_monetary);
> setlocale(LC_NUMERIC, locale_numeric);
>
> ***************
> *** 437,442 ****
> --- 511,524 ----
> CurrentLocaleConv.n_sign_posn = extlconv->n_sign_posn;
>
> /* Try to restore internal settings */
> + #ifdef WIN32
> + #undef strdup
> + if (lc_ctype_change)
> + {
> + setlocale(LC_CTYPE, save_lc_ctype);
> + pfree(save_lc_ctype);
> + }
> + #endif /* WIN32 */
> if (save_lc_monetary)
> {
> setlocale(LC_MONETARY, save_lc_monetary);
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce(at)momjian(dot)us> writes: > Where are we on this issue? According to my files, I complained about the extreme ugliness of the patch (redefining strdup for pete's sake) and the fact that it did not actually do things anything like the LC_TIME code as was claimed. Hiroshi rejected those criticisms. I don't know where we are, but I don't want to see this patch applied in this form. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce(at)momjian(dot)us> writes: > > Where are we on this issue? > > According to my files, I complained about the extreme ugliness of the > patch (redefining strdup for pete's sake) and the fact that it did not > actually do things anything like the LC_TIME code as was claimed. > Hiroshi rejected those criticisms. I don't know where we are, but > I don't want to see this patch applied in this form. Right, but you are saying it is still an open issue, which says we should look at it. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce(at)momjian(dot)us> writes: > Right, but you are saying it is still an open issue, which says we > should look at it. Sure. Maybe put it on TODO? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce(at)momjian(dot)us> writes: > > Right, but you are saying it is still an open issue, which says we > > should look at it. > > Sure. Maybe put it on TODO? OK, TODO is: Fix locale-aware handling (e.g. monetary) for specific server/client encoding combinations * http://archives.postgresql.org/pgsql-general/2009-04/msg00799.php If someone wants to work on it, go ahead. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Where are we on this issue? Oops I forgot it completely. I have a little improved version and would post it tonight. regards, Hiroshi Inoue > > --------------------------------------------------------------------------- > > Hiroshi Inoue wrote: >> Tom Lane wrote: >>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes: >>>> Does this imply that we shouldn't allow UTF8 database on Windows at all? >>> That would be pretty unfortunate :-( >>> >>> I think what this suggests is that there probably needs to be some >>> encoding conversion logic near the places we examine localeconv() >>> output. >> Attached is a patch to the current CVS. >> It uses a similar way like LC_TIME stuff does. >> >> regards, >> Hiroshi Inoue
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > Where are we on this issue? > > Oops I forgot it completely. > I have a little improved version and would post it tonight. Ah, very good. Thanks. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Hiroshi Inoue wrote: >> Bruce Momjian wrote: >>> Where are we on this issue? >> Oops I forgot it completely. >> I have a little improved version and would post it tonight. > > Ah, very good. Thanks. Attached is an improved version. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > Hiroshi Inoue wrote: > >> Bruce Momjian wrote: > >>> Where are we on this issue? > >> Oops I forgot it completely. > >> I have a little improved version and would post it tonight. > > > > Ah, very good. Thanks. > > Attached is an improved version. FYI, I am working on this patch now and will post an updated version. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > Hiroshi Inoue wrote: > >> Bruce Momjian wrote: > >>> Where are we on this issue? > >> Oops I forgot it completely. > >> I have a little improved version and would post it tonight. > > > > Ah, very good. Thanks. > > Attached is an improved version. I spent many hours on this patch and am attaching an updated version. I have restructured the code and added many comments, but this is the main one: * Ideally, the server encoding and locale settings would * always match. Unfortunately, WIN32 does not support UTF-8 * values for setlocale(), even though PostgreSQL runs fine with * a UTF-8 encoding on Windows: * * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx * * Therefore, we must set LC_CTYPE to match LC_NUMERIC and * LC_MONETARY, call localeconv(), and use mbstowcs() to * convert the locale-aware string, e.g. Euro symbol, which * is not in UTF-8 to the server encoding. I need someone with WIN32 experience to review and test this patch. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian wrote: > Hiroshi Inoue wrote: > > Bruce Momjian wrote: > > > Hiroshi Inoue wrote: > > >> Bruce Momjian wrote: > > >>> Where are we on this issue? > > >> Oops I forgot it completely. > > >> I have a little improved version and would post it tonight. > > > > > > Ah, very good. Thanks. > > > > Attached is an improved version. > > I spent many hours on this patch and am attaching an updated version. > I have restructured the code and added many comments, but this is the > main one: > > * Ideally, the server encoding and locale settings would > * always match. Unfortunately, WIN32 does not support UTF-8 > * values for setlocale(), even though PostgreSQL runs fine with > * a UTF-8 encoding on Windows: > * > * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx > * > * Therefore, we must set LC_CTYPE to match LC_NUMERIC and > * LC_MONETARY, call localeconv(), and use mbstowcs() to > * convert the locale-aware string, e.g. Euro symbol, which > * is not in UTF-8 to the server encoding. > > I need someone with WIN32 experience to review and test this patch. I don't understand why cache_locale_time() works on Windows. It sets the LC_CTYPE but does not do any encoding coversion. Do month and day-of-week names not work either, or do they work and the encoding conversion for numeric/money, e.g. Euro, it not necessary? -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian wrote: > Bruce Momjian wrote: >> Hiroshi Inoue wrote: >>> Bruce Momjian wrote: >>>> Hiroshi Inoue wrote: >>>>> Bruce Momjian wrote: >>>>>> Where are we on this issue? >>>>> Oops I forgot it completely. >>>>> I have a little improved version and would post it tonight. >>>> Ah, very good. Thanks. >>> Attached is an improved version. >> I spent many hours on this patch and am attaching an updated version. >> I have restructured the code and added many comments, but this is the >> main one: >> >> * Ideally, the server encoding and locale settings would >> * always match. Unfortunately, WIN32 does not support UTF-8 >> * values for setlocale(), even though PostgreSQL runs fine with >> * a UTF-8 encoding on Windows: >> * >> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx >> * >> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and >> * LC_MONETARY, call localeconv(), and use mbstowcs() to >> * convert the locale-aware string, e.g. Euro symbol, which >> * is not in UTF-8 to the server encoding. >> >> I need someone with WIN32 experience to review and test this patch. > > I don't understand why cache_locale_time() works on Windows. It sets > the LC_CTYPE but does not do any encoding coversion. Doesn't strftime_win32 do the conversion? > Do month and > day-of-week names not work either, or do they work and the encoding > conversion for numeric/money, e.g. Euro, it not necessary? db_strdup does the conversion. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > Bruce Momjian wrote: > >> Hiroshi Inoue wrote: > >>> Bruce Momjian wrote: > >>>> Hiroshi Inoue wrote: > >>>>> Bruce Momjian wrote: > >>>>>> Where are we on this issue? > >>>>> Oops I forgot it completely. > >>>>> I have a little improved version and would post it tonight. > >>>> Ah, very good. Thanks. > >>> Attached is an improved version. > >> I spent many hours on this patch and am attaching an updated version. > >> I have restructured the code and added many comments, but this is the > >> main one: > >> > >> * Ideally, the server encoding and locale settings would > >> * always match. Unfortunately, WIN32 does not support UTF-8 > >> * values for setlocale(), even though PostgreSQL runs fine with > >> * a UTF-8 encoding on Windows: > >> * > >> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx > >> * > >> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and > >> * LC_MONETARY, call localeconv(), and use mbstowcs() to > >> * convert the locale-aware string, e.g. Euro symbol, which > >> * is not in UTF-8 to the server encoding. > >> > >> I need someone with WIN32 experience to review and test this patch. > > > > I don't understand why cache_locale_time() works on Windows. It sets > > the LC_CTYPE but does not do any encoding coversion. > > Doesn't strftime_win32 do the conversion? Oh, I now see strftime is redefined as a macro in that C files. Thanks. > > Do month and > > day-of-week names not work either, or do they work and the encoding > > conversion for numeric/money, e.g. Euro, it not necessary? > > db_strdup does the conversion. Should we pull the encoding conversion into a separate function and have strftime_win32() and db_strdup() both call it? -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian wrote: > Hiroshi Inoue wrote: >> Bruce Momjian wrote: >>> Bruce Momjian wrote: >>>> Hiroshi Inoue wrote: >>>>> Bruce Momjian wrote: >>>>>> Hiroshi Inoue wrote: >>>>>>> Bruce Momjian wrote: >>>>>>>> Where are we on this issue? >>>>>>> Oops I forgot it completely. >>>>>>> I have a little improved version and would post it tonight. >>>>>> Ah, very good. Thanks. >>>>> Attached is an improved version. >>>> I spent many hours on this patch and am attaching an updated version. >>>> I have restructured the code and added many comments, but this is the >>>> main one: >>>> >>>> * Ideally, the server encoding and locale settings would >>>> * always match. Unfortunately, WIN32 does not support UTF-8 >>>> * values for setlocale(), even though PostgreSQL runs fine with >>>> * a UTF-8 encoding on Windows: >>>> * >>>> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx >>>> * >>>> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and >>>> * LC_MONETARY, call localeconv(), and use mbstowcs() to >>>> * convert the locale-aware string, e.g. Euro symbol, which >>>> * is not in UTF-8 to the server encoding. >>>> >>>> I need someone with WIN32 experience to review and test this patch. >>> I don't understand why cache_locale_time() works on Windows. It sets >>> the LC_CTYPE but does not do any encoding coversion. >> Doesn't strftime_win32 do the conversion? > > Oh, I now see strftime is redefined as a macro in that C files. Thanks. > >>> Do month and >>> day-of-week names not work either, or do they work and the encoding >>> conversion for numeric/money, e.g. Euro, it not necessary? >> db_strdup does the conversion. > > Should we pull the encoding conversion into a separate function and have > strftime_win32() and db_strdup() both call it? We may be able to pull the conversion WideChars => UTF8 => a PG encoding into an function. BTW both PGLC_localeconv() and cache_locale_time() save the current LC_CTYPE first and restore them just before returning the functions. I'm suspicious if it's OK when errors occur in middle of the functions. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > >>>> I need someone with WIN32 experience to review and test this patch. > >>> I don't understand why cache_locale_time() works on Windows. It sets > >>> the LC_CTYPE but does not do any encoding coversion. > >> Doesn't strftime_win32 do the conversion? > > > > Oh, I now see strftime is redefined as a macro in that C files. Thanks. > > > >>> Do month and > >>> day-of-week names not work either, or do they work and the encoding > >>> conversion for numeric/money, e.g. Euro, it not necessary? > >> db_strdup does the conversion. > > > > Should we pull the encoding conversion into a separate function and have > > strftime_win32() and db_strdup() both call it? > > We may be able to pull the conversion WideChars => UTF8 => > a PG encoding into an function. OK, I have created a new function, win32_wchar_to_db_encoding(), to share the conversion from wide characters to the database encoding. New patch attached. > BTW both PGLC_localeconv() and cache_locale_time() save the current > LC_CTYPE first and restore them just before returning the functions. > I'm suspicious if it's OK when errors occur in middle of the functions. Yea, I added a comment questioning if that is a problem. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> OK, I have created a new function, win32_wchar_to_db_encoding(), to
> share the conversion from wide characters to the database encoding.
> New patch attached.
Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
db_encoding_strdup() with the function. Like this:
static char *
db_encoding_strdup(const char *str)
{
char *pstr;
char *mstr;
/* convert the string to the database encoding */
pstr = (char *) pg_do_encoding_conversion(
(unsigned char *) str, strlen(str),
GetPlatformEncoding(), GetDatabaseEncoding());
mstr = strdup(pstr);
if (pstr != str)
pfree(pstr);
return mstr;
}
I beleive the code is harmless on all platforms and we can use it
instead of strdup() without any #ifdef WIN32 quotes.
BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > OK, I have created a new function, win32_wchar_to_db_encoding(), to
> > share the conversion from wide characters to the database encoding.
> > New patch attached.
>
> Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> db_encoding_strdup() with the function. Like this:
>
> static char *
> db_encoding_strdup(const char *str)
> {
> char *pstr;
> char *mstr;
>
> /* convert the string to the database encoding */
> pstr = (char *) pg_do_encoding_conversion(
> (unsigned char *) str, strlen(str),
> GetPlatformEncoding(), GetDatabaseEncoding());
> mstr = strdup(pstr);
> if (pstr != str)
> pfree(pstr);
>
> return mstr;
> }
>
> I beleive the code is harmless on all platforms and we can use it
> instead of strdup() without any #ifdef WIN32 quotes.
OK, I don't have any Win32 people testing this patch so if we want this
fixed for 9.0 someone is going to have to test my patch to see that it
works. Can you make the adjustments suggested above to my patch and
test it to see that it works so we can apply it for 9.0?
> BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
> PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.
OK.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
Bruce Momjian <bruce(at)momjian(dot)us> wrote: > Takahiro Itagaki wrote: > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify > > db_encoding_strdup() with the function. Like this: > > OK, I don't have any Win32 people testing this patch so if we want this > fixed for 9.0 someone is going to have to test my patch to see that it > works. Can you make the adjustments suggested above to my patch and > test it to see that it works so we can apply it for 9.0? Here is a full patch that can be applied cleanly to HEAD. Can anyone test it on Windows? I'm not sure why temporary changes of lc_ctype was required in the original patch. The codes are not included in my patch, but please notice me it is still needed. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > > Bruce Momjian <bruce(at)momjian(dot)us> wrote: > > > Takahiro Itagaki wrote: > > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify > > > db_encoding_strdup() with the function. Like this: > > > > OK, I don't have any Win32 people testing this patch so if we want this > > fixed for 9.0 someone is going to have to test my patch to see that it > > works. Can you make the adjustments suggested above to my patch and > > test it to see that it works so we can apply it for 9.0? > > Here is a full patch that can be applied cleanly to HEAD. > Can anyone test it on Windows? > > I'm not sure why temporary changes of lc_ctype was required in the > original patch. The codes are not included in my patch, but please > notice me it is still needed. Sorry for the delay in replying to you. I considered your idea of using the existing Postgres encoding conversion routines to do the conversion of localenv() strings, but found two problems. First, GetPlatformEncoding() caches its result, so it assumes the LC_CTYPE never changes for the server, while fixing this issue actually requires us to change LC_CTYPE. We could avoid the caching but that then involves complex table lookups, etc, which seems overly complex: + /* convert the string to the database encoding */ + pstr = (char *) pg_do_encoding_conversion( + (unsigned char *) str, strlen(str), + GetPlatformEncoding(), GetDatabaseEncoding()); Second, having our backend routines do the conversion seems wrong because it is possible for someone to set LC_MONETARY to an encoding that our database does not understand, e.g. UTF16, but one that WIN32 can convert to a valid encoding. The reason we are doing all this is because of this updated comment in my patch: ftp://momjian.us/pub/postgresql/mypatches/pg_locale + * Ideally, monetary and numeric local symbols could be returned in + * any server encoding. Unfortunately, the WIN32 API does not allow + * setlocale() to return values in a codepage/CTYPE that uses more + * than two bytes per character, like UTF-8: + * + * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx + * + * Evidently, LC_CTYPE allows us to control the encoding used + * for strings returned by localeconv(). The Open Group + * standard, mentioned at the top of this C file, doesn't + * explicitly state this. + * + * Therefore, we set LC_CTYPE to match LC_NUMERIC and + * LC_MONETARY, call localeconv(), and use mbstowcs() to + * convert the locale-aware string, e.g. Euro symbol (which + * is not in UTF-8), to the server encoding. One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally on Win32 and then just convert that always to the server encoding with win32_wchar_to_db_encoding(), instead of using the encoding from LC_MONETARY to set LC_CTYPE and having to do double-conversion. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
On Mon, Mar 22, 2010 at 9:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote: > Takahiro Itagaki wrote: >> >> Bruce Momjian <bruce(at)momjian(dot)us> wrote: >> >> > Takahiro Itagaki wrote: >> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify >> > > db_encoding_strdup() with the function. Like this: >> > >> > OK, I don't have any Win32 people testing this patch so if we want this >> > fixed for 9.0 someone is going to have to test my patch to see that it >> > works. Can you make the adjustments suggested above to my patch and >> > test it to see that it works so we can apply it for 9.0? >> >> Here is a full patch that can be applied cleanly to HEAD. >> Can anyone test it on Windows? >> >> I'm not sure why temporary changes of lc_ctype was required in the >> original patch. The codes are not included in my patch, but please >> notice me it is still needed. > > Sorry for the delay in replying to you. > > I considered your idea of using the existing Postgres encoding > conversion routines to do the conversion of localenv() strings, but > found two problems. > > First, GetPlatformEncoding() caches its result, so it assumes the > LC_CTYPE never changes for the server, while fixing this issue actually > requires us to change LC_CTYPE. We could avoid the caching but that > then involves complex table lookups, etc, which seems overly complex: > > + /* convert the string to the database encoding */ > + pstr = (char *) pg_do_encoding_conversion( > + (unsigned char *) str, strlen(str), > + GetPlatformEncoding(), GetDatabaseEncoding()); > > Second, having our backend routines do the conversion seems wrong > because it is possible for someone to set LC_MONETARY to an encoding > that our database does not understand, e.g. UTF16, but one that WIN32 > can convert to a valid encoding. > > The reason we are doing all this is because of this updated comment in > my patch: > > ftp://momjian.us/pub/postgresql/mypatches/pg_locale > > + * Ideally, monetary and numeric local symbols could be returned in > + * any server encoding. Unfortunately, the WIN32 API does not allow > + * setlocale() to return values in a codepage/CTYPE that uses more > + * than two bytes per character, like UTF-8: > + * > + * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx > + * > + * Evidently, LC_CTYPE allows us to control the encoding used > + * for strings returned by localeconv(). The Open Group > + * standard, mentioned at the top of this C file, doesn't > + * explicitly state this. > + * > + * Therefore, we set LC_CTYPE to match LC_NUMERIC and > + * LC_MONETARY, call localeconv(), and use mbstowcs() to > + * convert the locale-aware string, e.g. Euro symbol (which > + * is not in UTF-8), to the server encoding. > > One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally > on Win32 and then just convert that always to the server encoding with > win32_wchar_to_db_encoding(), instead of using the encoding from > LC_MONETARY to set LC_CTYPE and having to do double-conversion. So, hugely late, reviving this thread. Ideally, we should definitely consider doing that. Internally, Windows will do it in UTF16 anyway. So we're basically doing UTF16->db->UTF16->UTF8->db or something like that with this patch. But I'm unsure how that would work. We're talking about the output of localeconv(), right? I don't see a version of localeconv() that does wide chars anywhere. (You can't just set LC_CTYPE and use the regular function - Windows has a separate set of functions for dealing with UTF16). Looking at the patch, you're passing "item" to db_encoding_strdup() but it doesn't seem to be used anywhere. Leftover from previous experiments, or forgot to use it? Perhaps you intended for it to be in the error messages? Also, won't this need special-casing for UTF8? Per comment in mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to use MultiByteToWideChar(). I also note that we have char2wchar() already - we should perhaps just call that? Or will that use the wrong locale? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> But I'm unsure how that would work. We're talking about the output of
> localeconv(), right? I don't see a version of localeconv() that does
> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
> function - Windows has a separate set of functions for dealing with
> UTF16).
Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns
characters in the encoding specified in LC_TYPE, we need to hande the
issue with codes something like:
1. setlocale(LC_CTYPE, lc_monetary)
2. setlocale(LC_MONETARY, lc_monetary)
3. lc = localeconv()
4. pg_do_encoding_conversion(lc->xxx,
FROM pg_get_encoding_from_locale(lc_monetary),
TO GetDatabaseEncoding())
5. Revert LC_CTYPE and LC_MONETARY.
Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
functions, instead of the libc one. It returns locale characters in wide
chars, so we can safely convert them as UTF16->UTF8->db. But it requires
an additional branch in our locale codes only for Windows.
[1] http://msdn.microsoft.com/en-us/library/dd318101
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
On Mon, Apr 19, 2010 at 03:59, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote: > > Magnus Hagander <magnus(at)hagander(dot)net> wrote: > >> But I'm unsure how that would work. We're talking about the output of >> localeconv(), right? I don't see a version of localeconv() that does >> wide chars anywhere. (You can't just set LC_CTYPE and use the regular >> function - Windows has a separate set of functions for dealing with >> UTF16). > > Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns > characters in the encoding specified in LC_TYPE, we need to hande the > issue with codes something like: > > 1. setlocale(LC_CTYPE, lc_monetary) > 2. setlocale(LC_MONETARY, lc_monetary) > 3. lc = localeconv() > 4. pg_do_encoding_conversion(lc->xxx, > FROM pg_get_encoding_from_locale(lc_monetary), > TO GetDatabaseEncoding()) > 5. Revert LC_CTYPE and LC_MONETARY. > > > Another idea is to use GetLocaleInfoW() [1], that is win32 native locale > functions, instead of the libc one. It returns locale characters in wide > chars, so we can safely convert them as UTF16->UTF8->db. But it requires > an additional branch in our locale codes only for Windows. If we can go UTF16->db directly, it might be a good idea. If we're going via UTF8 anyway, I doubt it's going to be worth it. Let's work off what we have now to start with at least. Bruce, can you comment on that thing about the extra parameter? And UTF8? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus(at)hagander(dot)net> wrote: > > 1. setlocale(LC_CTYPE, lc_monetary) > > 2. setlocale(LC_MONETARY, lc_monetary) > > 3. lc = localeconv() > > 4. pg_do_encoding_conversion(lc->xxx, > > FROM pg_get_encoding_from_locale(lc_monetary), > > TO GetDatabaseEncoding()) > > 5. Revert LC_CTYPE and LC_MONETARY. A patch attached for the above straightforwardly. Does this work? Note that #ifdef WIN32 parts in the patch are harmless on other platforms even if they are enabled. > Let's work off what we have now to start with at least. Bruce, can you > comment on that thing about the extra parameter? And UTF8? Regards, --- Takahiro Itagaki NTT Open Source Software Center
Magnus Hagander wrote: > > One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally > > on Win32 and then just convert that always to the server encoding with > > win32_wchar_to_db_encoding(), instead of using the encoding from > > LC_MONETARY to set LC_CTYPE and having to do double-conversion. > > So, hugely late, reviving this thread. > > Ideally, we should definitely consider doing that. Internally, Windows > will do it in UTF16 anyway. So we're basically doing > UTF16->db->UTF16->UTF8->db or something like that with this patch. > > But I'm unsure how that would work. We're talking about the output of > localeconv(), right? I don't see a version of localeconv() that does > wide chars anywhere. (You can't just set LC_CTYPE and use the regular > function - Windows has a separate set of functions for dealing with > UTF16). I thought there was an LC_CTYPE for UTF16 that we could use without a wide version of that function. If not, forget that idea. > Looking at the patch, you're passing "item" to db_encoding_strdup() > but it doesn't seem to be used anywhere. Leftover from previous > experiments, or forgot to use it? Perhaps you intended for it to be in > the error messages? It originally was in the error message but can be removed. I have now removed 'item' from my version of the patch. > Also, won't this need special-casing for UTF8? Per comment in > mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to > use MultiByteToWideChar(). Well, we don't support UTF8 for any of the non-encoding locales, e.g. monetary, numeric, so I never considered that we would support it. If we did support it, we would have to _pick_ a locale that is <= 2 bytes per character and use that, and then convert to UTF8, but what locale would we pick? They could use a LC_TYPE that is <= 2 bytes and a numeric that is UTF8, but I never suspected we would want to support that, and we would need some logic to detect that case. > I also note that we have char2wchar() already - we should perhaps just > call that? Or will that use the wrong locale? I see char2wchar() calling GetDatabaseEncoding() right away, which does use the cached value for the server encoding, so I don't think it will work. We can't use our existing routines to convert _from_ the current encoding to wide characters (because our numeric encoding might not match the server encoding). However, we can use existing code that converts from wide to the server encoding, perhaps replacing win32_wchar_to_db_encoding(). -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com
Takahiro Itagaki wrote: > > Magnus Hagander <magnus(at)hagander(dot)net> wrote: > > > > 1. setlocale(LC_CTYPE, lc_monetary) > > > 2. setlocale(LC_MONETARY, lc_monetary) > > > 3. lc = localeconv() > > > 4. pg_do_encoding_conversion(lc->xxx, > > > FROM pg_get_encoding_from_locale(lc_monetary), > > > TO GetDatabaseEncoding()) > > > 5. Revert LC_CTYPE and LC_MONETARY. > > A patch attached for the above straightforwardly. Does this work? > Note that #ifdef WIN32 parts in the patch are harmless on other platforms > even if they are enabled. I like this patch. Instead of having special code to convert from the _current_ locale, you pass the encoding name to our routines. This does mean we are bound by supporting only the encodings PG supports, not the full range of Win32 encodings, but that seems fine. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com
Magnus Hagander wrote: > > Another idea is to use GetLocaleInfoW() [1], that is win32 native locale > > functions, instead of the libc one. It returns locale characters in wide > > chars, so we can safely convert them as UTF16->UTF8->db. But it requires > > an additional branch in our locale codes only for Windows. > > If we can go UTF16->db directly, it might be a good idea. If we're > going via UTF8 anyway, I doubt it's going to be worth it. > > Let's work off what we have now to start with at least. Bruce, can you > comment on that thing about the extra parameter? And UTF8? I do like the idea of using UTF16 directly because that would eliminate our need to even set LC_CTYPE for Win32 in this routine. That would also eliminate any need to refer to the encoding for numeric/monetary, so we could get rid of the odd case where their encoding is UTF8 but their numeric/monetary locale settings have to use a non-UTF8 encoding. For example, the original bug report has these locale settings: http://archives.postgresql.org/pgsql-general/2009-04/msg00829.php psql (PostgreSQL) 8.3.7 server_version 8.3.7 server_encoding UTF8 client_encoding win1252 lc_numeric Finnish, Finland lc_monetary Finnish, Finland but really needed to use "Finnish_Finland.1252": http://archives.postgresql.org/pgsql-general/2009-04/msg00859.php However, I noticed that both lc_collate and lc_ctype are set to Finnish_Finland.1252 by the installer. Should I have just run initdb with --locale fi_FI.UTF8 at the very start? The to_char('L') works fine with a database with win1252 encoding. Of course, that still does not work with our current CVS code if the database encoding is UTF8, which is what we are trying to fix now. I am not even sure how users set these things properly but I assume the installer does all that magic. And, of course, if someone manually runs initdb on Windows, they can easily set things wrong. Magnus, if I remember correctly, all our non-UTF8 to UTF8 conversion already has to pass through UTF16 as an intermediary case, so going to UTF16 directly seems fine. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com
Takahiro Itagaki wrote: > Magnus Hagander <magnus(at)hagander(dot)net> wrote: > >>> 1. setlocale(LC_CTYPE, lc_monetary) >>> 2. setlocale(LC_MONETARY, lc_monetary) >>> 3. lc = localeconv() >>> 4. pg_do_encoding_conversion(lc->xxx, >>> FROM pg_get_encoding_from_locale(lc_monetary), >>> TO GetDatabaseEncoding()) >>> 5. Revert LC_CTYPE and LC_MONETARY. > > A patch attached for the above straightforwardly. Does this work? I have 2 questions about this patch. 1. How does it work when LC_MONETARY and LC_NUMERIC are different? 2. Calling db_encoding_strdup() for lconv->grouping is appropriate? regards, Hiroshi Inoue > Note that #ifdef WIN32 parts in the patch are harmless on other platforms > even if they are enabled. > >> Let's work off what we have now to start with at least. Bruce, can you >> comment on that thing about the extra parameter? And UTF8? > > Regards, > --- > Takahiro Itagaki > NTT Open Source Software Center
Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> wrote: > 1. How does it work when LC_MONETARY and LC_NUMERIC are different? I think it is rarely used, but possible. Fixed. > 2. Calling db_encoding_strdup() for lconv->grouping is appropriate? Ah, we didn't need it. Removed. Revised patch attached. Please test it. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote: > Revised patch attached. Please test it. I applied this version of the patch. Please check wheter the bug is fixed and any buildfarm failures. Regards, --- Takahiro Itagaki NTT Open Source Software Center
Takahiro Itagaki wrote: > > Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote: > > > Revised patch attached. Please test it. > > I applied this version of the patch. > Please check wheter the bug is fixed and any buildfarm failures. Great. I have merged in my C comments into the code with the attached patch so we remember why the code is setup as it is. One thing I am confused about is that, for Win32, our numeric/monetary handling sets lc_ctype to match numeric/monetary, while our time code in the same file uses that method _and_ uses wcsftime() to return the value in wide characters. So, why do we do both for time? Is there any value to that? Seems we should do the same for both numeric/monetary and time. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian wrote: > Takahiro Itagaki wrote: >> Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote: >> >>> Revised patch attached. Please test it. >> I applied this version of the patch. >> Please check wheter the bug is fixed and any buildfarm failures. > > Great. I have merged in my C comments into the code with the attached > patch so we remember why the code is setup as it is. > > One thing I am confused about is that, for Win32, our numeric/monetary > handling sets lc_ctype to match numeric/monetary, while our time code in > the same file uses that method _and_ uses wcsftime() to return the value > in wide characters. So, why do we do both for time? Is there any value > to that? Unfortunately wcsftime() is a halfway conveniece function which uses ANSI version of functionalities internally. AFAIC the only way to remove the dependency to LC_CTYPE is to call GeLocaleInfoW() directly. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > Takahiro Itagaki wrote: > >> Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote: > >> > >>> Revised patch attached. Please test it. > >> I applied this version of the patch. > >> Please check wheter the bug is fixed and any buildfarm failures. > > > > Great. I have merged in my C comments into the code with the attached > > patch so we remember why the code is setup as it is. > > > > One thing I am confused about is that, for Win32, our numeric/monetary > > handling sets lc_ctype to match numeric/monetary, while our time code in > > the same file uses that method _and_ uses wcsftime() to return the value > > in wide characters. So, why do we do both for time? Is there any value > > to that? > > Unfortunately wcsftime() is a halfway conveniece function which uses > ANSI version of functionalities internally. > AFAIC the only way to remove the dependency to LC_CTYPE is to call > GeLocaleInfoW() directly. Thanks. I have documented this fact in a C comment; patch attached. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com