Re: Case Preservation disregarding case sensitivity?

From: beau hargis <beauh(at)bluefrogmobile(dot)com>
To: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-31 03:28:50
Message-ID: 200610301928.51255.beauh@bluefrogmobile.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

On Friday 27 October 2006 19:38, Joe wrote:
> Hi Beau,
>
> On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote:
> > I am hoping that there is an easy way to obtain case-preservation with
> > case-insensitivity, or at the very least, case-preservation and complete
> > case-sensitivity, or case-preservation and a consistant case-conversion
> > strategy.
> >
> > The case of the column names need to be preserved because that is the way
> > the schema is designed and most importantly (VERY, VERY IMPORTANT),
> > column names are used in apps as hash values, or as named references
> > which are case sensitive and as such need to be delivered to the client
> > in exactly in the manner specified at the time of table creation.
> >
>
> I went through the same issue in my conversion from MySQL to Postgres
> and (since I had a small application) I ended up changing up all my
> tables and columns "UserProfile" to user_profile.
>
> I'm afraid however, that it's MySQL that is the odd man out. I haven't
> researched this completely but I believe PG follows either the FIPS-127
> or SQL-92 standard with respect to what are called "delimited
> identifiers". Basically, this says if you want case sensitivity in
> identifier names, you have to use double quotes wherever you refer to
> the identifier. Without the double quotes, the SQL implementor can
> either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as
> PG does) when it displays those identifiers.
>
> Joe

Again, I am at the same point I was at when searching and searching for
information on the problem, and I am not sure what the SQL standard has to
say about it: I do not, and I dont think that anyone else who is struggling
to deal with the problem does, care about case-sensitivity. I am interested
in case preservation of column names. I do, indeed, want identifiers treated
in a case insensitive way, but I want the case PRESERVED in the table
definitions and I want that case, as preserved, to be reflected in the field
names as returned by the server to any client library that connects and
initiates a query.

Case-preservation is not the same as case-sensitivity; nor is
case-normalization the same as case-insensitivity. What PostgreSQL is doing
is converting any, and all, identifiers to a lower case and then matching
those against the identifiers (as stored in the table definition) in a
case-sensitive manner. It 'normalizes' the case of the identifiers so that it
has a common internal representation; the desires of the programmer and
database architect be damned.

Referenced specification details:
From FIPS-127:
=======================================
3. Delimited identifiers. In the previous ANSI SQL specification, it was not
possible for an application to specify identifiers with spaces or other
special symbols. Also, it was not possible to protect against future assaults
on the name space for (identifier) by additions to the (reserved word) list.
The new facility for (delimited identifier) allows a user to enclose all
identifiers in double-quotation marks, thereby ensuring that the name defined
or referenced may contain spaces or other special symbols and will not be
impacted by future additions to the (reserved word) list.
=======================================

From SQL-92/Sec. 5.2:
=======================================
10)The <identifier body> of a <regular identifier> is equivalent
to an <identifier body> in which every letter that is a lower-
case letter is replaced by the equivalent upper-case letter
or letters. This treatment includes determination of equiva-
lence, representation in the Information and Definition Schemas,
representation in the diagnostics area, and similar uses.

11)The <identifier body> of a <regular identifier> (with every
letter that is a lower-case letter replaced by the equivalent
upper-case letter or letters), treated as the repetition of
a <character string literal> that specifies a <character set
specification> of SQL_TEXT, shall not be equal, according to
the comparison rules in Subclause 8.2, "<comparison predicate>",
to any <reserved word> (with every letter that is a lower-case
letter replaced by the equivalent upper-case letter or letters),
treated as the repetition of a <character string literal> that
specifies a <character set specification> of SQL_TEXT.

12)Two <regular identifier>s are equivalent if their <identifier
body>s, considered as the repetition of a <character string
literal> that specifies a <character set specification> of
SQL_TEXT, compare equally according to the comparison rules
in Subclause 8.2, "<comparison predicate>".

13)A <regular identifier> and a <delimited identifier> are equiva-
lent if the <identifier body> of the <regular identifier> (with
every letter that is a lower-case letter replaced by the equiva-
lent upper-case letter or letters) and the <delimited identifier
body> of the <delimited identifier> (with all occurrences of
<quote> replaced by <quote symbol> and all occurrences of <dou-
blequote symbol> replaced by <double quote>), considered as
the repetition of a <character string literal> that specifies a
<character set specification> of SQL_TEXT and an implementation-
defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison
predicate>".

14)Two <delimited identifier>s are equivalent if their <delimited
identifier body>s (with all occurrences of <quote> replaced
by <quote symbol> and all occurrences of <doublequote symbol>
replaced by <doublequote>), considered as the repetition of a
<character string literal> that specifies a <character set spec-
ification> of SQL_TEXT and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".
========================================

Having installed DB2 Enterprise today and taking it for a spin, it does indeed
behave in a similar manner. However, after reading through both
specifications, it seems that DB2 follows more of the spec than PostgreSQL.
The specifications state that for purpose of comparing identifiers, both
shall be converted to upper-case. DB2 displays all identifiers in upper-case
whereas PostgreSQL displays all identifiers in lower-case. This alone would
be a deviation from the specification.

It seems, however, that the regular-identifier/delimited-identifier
distinction is for the embedding of spaces or normally invalid characters
from other character sets, not for actually creating case-sensitive
identifiers. It is only a side-effect. It does not enforce any method for the
representation of the identifiers which the client expects, either. Since a
query can be constructed which specifies aliases (... AS ...), the
presentation of the data is implementation/runtime specific.

Considering the differences that already exist between database systems and
their varying compliance with SQL and the various extensions that have been
created, I do not consider that the preservation of case for identifiers
would violate any SQL standard. The end result being eaiser migration, the
continued operation of cross-platform tools and applications, and a database
designer or data architect having more flexibility in the construction and
presentation of a database.

Again, the distinction between case-sensitivity and case-preservation, and,
case-insensitivity and case-normalization needs to be recognized. Having
case-preservation defined in the CREATE TABLE syntax and/or a server
configuration seems, to me, to be a valuable addition. Of course, making it
an option would not interfere with the current state of affairs.

I have looked through the source code and have identified places which need to
be patched to make this possible. Tomorrow I will work on that and see what
it will impact in the code. Let me know what you think.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chuck McDevitt 2006-10-31 06:08:31 Re: [HACKERS] Case Preservation disregarding case
Previous Message Tom Lane 2006-10-31 03:23:30 Re: Case Preservation disregarding case sensitivity?

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Broersma Jr 2006-10-31 03:32:45 Re: Add calculated fields from one table to other table
Previous Message Tom Lane 2006-10-31 03:23:30 Re: Case Preservation disregarding case sensitivity?