Skip site navigation (1) Skip section navigation (2)

Identifier case folding notes

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Identifier case folding notes
Date: 2008-07-08 17:25:38
Message-ID: 200807081925.40467.peter_e@gmx.net (view raw or flat)
Thread:
Lists: pgsql-hackers
I have had some idle thoughts on the issue of identifier case folding.  Maybe 
we can collect our ideas and inch closer to a solution sometime.  Or we 
determine that it's useless and impossible, but then I can at least collect 
that result in the wiki and point future users to it.

Background: PostgreSQL folds unquoted identifiers to lower case.  SQL 
specifies folding to upper case, and most other SQL DBMS do it that way.  If 
an application mixes referring to an object using quoted and unquoted 
versions of an identifier, then incompatibilities arise.  We have always 
stressed to users that one should refer to an object always unquoted or 
always quoted.  While that remains a good suggestion for a number of reasons, 
we have seen occasional complaints that some closed source applications 
violate this rule and therefore cannot be run on PostgreSQL.

A number of solutions have been proposed over time, which I summarize here:

1. Change the lexer to fold to upper case, as it is supposed to do according 
to the SQL standard.  This will break almost everything, because almost all 
built-in objects have lower-case names and thus couldn't be referred to 
anymore except by quoting.  Changing the names of all the internal objects to 
upper-case names would involve vast code changes, probably break just as much, 
and make everything uglier.  So this approach is unworkable.

2. Fold to upper case, but not when referring built-in objects.  The lexer 
generally doesn't know what a name will refer to, so this is not possible to 
implement, at least without resorting to lots of hard-coding or horrid 
kludges.  Also, a behavior like this will probably create all kinds of weird 
inconsistencies, resulting from putting catalog knowledge in the lexer.

3 and 4 -- Two variants of "ignore case altogether":

3. Fold all identifiers to lower case, even quoted ones.  This would probably 
in fact fix the breakage of many of the above-mentioned problem applications, 
and it would in general be very easy to understand for a user.  And it could 
be implemented in about three lines.  One disadvantage is that one could no 
longer have objects that have names different only by case, but that is 
probably rare and incredibly stupid and can be neglected.  The main 
disadvantage is that the case of identifiers and in particular column labels 
is lost.  So applications and programming interfaces that look up result 
columns in a case-sensitive manner would fail.  And things like SELECT expr AS 
"Nice Heading" won't work properly anymore.

4. Compare the "name" data type in a case-insensitive manner.  This would 
probably address most problem cases.  Again, you can't have objects with names 
different in case only.  One condition to implementing this would be that this 
behavior would have be tied down globally at initdb, because it affects system 
indexes and shared catalogs.  That might be impractical for some, because 
you'd need different instances for different behaviors, especially when you 
want to host multiple applications or want to port an affected application to 
the native PostgreSQL behavior over time.

5. One additional approach I thought of is that you swap the case of 
identifiers as you lex them (upper to lower, lower to upper), and then swap 
them back when you send them to the client.  This needs a small change in the 
lexer, one for sending the RowDescription, and support in pg_dump and a few 
other places if desired.  There will, however, be a number of weird, albeit 
self-imposed, side-effects.  I have implemented a little test patch for this.  
It's weird, but it works in basic ways.

Obviously, no solution will ever work completely.  And we probably don't want 
such a solution, because it would create two different and incompatible 
PostgreSQL universes.  If we are aiming for a solution that would allow most 
affected applications to hobble along, we would probably serve most users.  
Implementing some or all of 3, 4, and 5 would probably achieve that.

Comments?

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2008-07-08 17:34:01
Subject: Re: [PATCHES] Solaris ident authentication using unix domain sockets
Previous:From: Josh BerkusDate: 2008-07-08 16:44:57
Subject: Re: Proposal of SE-PostgreSQL patches [try#2]

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group