Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS
Date: 2011-06-09 17:55:02
Message-ID: BANLkTikuTMGn8=Jw6vsYEebAuHZREscNvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 9, 2011 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 9, 2011 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Hmm ... while the above is easy enough to do in the backend, where we
>>> can look at pg_database_encoding_max_length, we have also got instances
>>> of this coding pattern in src/port/pgstrcasecmp.c.  It's a lot less
>>> obvious how to make the test in frontend environments.  Thoughts anyone?
>
>> I'm not sure if this helps at all, but an awful lot of those tests are
>> against hard-coded strings that are known to contain only ASCII
>> characters.  Is there some way we can optimize this for that case?
>
> For the places where we're just looking for a match to a fixed all-ASCII
> string, an ASCII-only downcasing would be sufficient, and would
> eliminate the whole problem.  But I doubt all the callers fall into that
> class.
>
> What I'm particularly worried about at the moment is whether we are
> assuming anywhere that the frontend side can duplicate the backend's
> identifier downcasing behavior.  That seems like a complete morass,
> because (1) they might not have the same locale, (2) they might not
> have the same encoding, (3) even if they do, the "same" locale is known
> to behave differently on different platforms.

Right. Understood. So let's look at the cases (from git grep
pg_strcasecmp and pg_strncasecmp):

contrib/dict_int: Fixed strings only, and it's all backend code anyway.
contrib/dict_xsyn: Fixed strings only, and it's all backend code anyway.
contrib/hstore: Fixed strings only, and it's all backend code anyway.
contrib/pg_upgrade: Used to compare LC_COLLATE, LC_CTYPE, and encoding names.
contrib/pgbench: Definitely front-end code, but it's all fixed strings.
contrib/pgcrypto: All fixed strings except for one instance in
px_find_digit. But it's all backend
contrib/spi: One instance, not a fixed string, but it's backend code.
contrib/unaccent: One instance, not a fixed string, but it's backend code.
src/backend/*: Backend code, obviously.
src/bin/initdb: Strings from a constant lookup table
(tsearch_config_languages) only.
src/bin/pg_basebackup: Fixed strings only.
src/bin/pg_ctl: Fixed strings only.
src/bin/pg_dump: Fixed strings only.
src/bin/psql: Fixed strings only. In a couple of cases they are not
constants - help.c uses strings from to generated file sql_help.h, and
tab-complete.c uses strings from a constant array called
words_after_create[]. But these are constant lookup tables.
src/include: access/reloptions.h uses strncasecmp() as part of a
macro. That should be OK as long as no one tries to include this in
frontend code, which seems rather impractical.
src/interfaces/ecpg/ecpglib: Fixed strings.
src/interfaces/ecpg/pgtypeslib: Fixed strings, and strings from a
constant lookup table, only.
src/interfaces/ecpg/preproc: This looks a bit worrisome. It seems we
might be using it on identifiers here.
src/interfaces/libpq: This is attempting to match a wildcard
certificate name against a hostname, in two different places.
src/port/chklocale.c: Fixed strings or ones from a lookup table.
src/timezone/pgtz.c: Matches input strings against filenames read from the OS.

So mostly I think these are OK. The instance in
src/interfaces/ecpg/preproc looks like the most likely candidate for a
problem spot.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-06-09 17:59:00 Re: Postmaster holding unlinked files for pg_largeobject table
Previous Message Heikki Linnakangas 2011-06-09 17:54:45 Re: SLRU limits