Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS
Date: 2011-06-09 04:39:36
Message-ID: BANLkTimjbSEFTsqOVgRgvgr+KRnzg2BZCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 8, 2011 at 6:22 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> 2011/6/7 Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>:
> > since we smash the identifier to lower case using
> > downcase_truncate_identifier() function, the solution is to make this
> > function should be wide-char aware, like LOWER() function functionality.
> >
> > I see some discussion related to downcase_truncate_identifier() and
> > wide-char aware function, but seems like we lost somewhere.
> > (http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
> > This invalid byte sequence issue seems like a more serious issue, because
> it
> > might lead e.g to pg_dump failures.
>
> It's a problem, but without an efficient algorithm for Unicode case
> folding, any fix we attempt to implement seems like it'll just be
> moving the problem around.
>

Agree.

I read on other mail thread that str_tolower() is a wide-character-aware
lower function but it is also a collation-aware and hence might change its
behaviour wrt change in locale. However, Tom suggested that we need to have
non-locale-dependent case folding algorithm.

But still for same locale on same machine, where we can able to create a
table, insert some data, we cannot retrieve it. Don't you think it is more
serious and we need a quick solution here? As said earlier it may even lead
to pg_dump failures. Given that str_tolower() functionality is locale
dependent but still it will resolve this particular issue. Not sure, there
might be a performance issue but at-least we are not giving an error.

Please excuse me, if community already had a lot of discussion and kept this
behaviour intentionally knowing all these errors and serious issues.

Thanks

--
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-06-09 04:40:12 Re: literature on write-ahead logging
Previous Message Merlin Moncure 2011-06-09 04:30:03 Re: WALInsertLock contention