Bringing PostgreSQL torwards the standard regarding case folding

From: Shachar Shemesh <psql(at)shemesh(dot)biz>
To: PostgreSQL development <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Subject: Bringing PostgreSQL torwards the standard regarding case folding
Date: 2004-04-25 14:44:23
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm opening a new thread, as the previous one was too nested, and
contained too much emotions.

I'll start by my understanding of a summary of the thread so far. The
solution we are seeking would have to satisfy the following conditions:
1. Setting should be on a per-database level. A per-server option is not
good enough, and a per-session option is too difficult to implement,
with no apparent justifiable return.
2. Old applications already working with PG's lowercase folding should
have an option to continue working unmodified for the foreseeable future.

Solutions offered so far, and their status:
1. Tri-state. Folder upper, if failes, fold lower, if succeeds, warn.
2. Dual state. Fold lower or upper. Break if client is broken.
3. Create a database conversion tool to change existing case.

Solution 1:
As currently the case folding is performed disjointed from the actual
use of the identifier, this solution requires quite a big amount of
work. On the other hand, and on second thought, it's main benefit -
gradual transition of applications from one to the other, is not really
necessary once you declare the current behaviour as there to stay.
Existing applications can simply choose to continue using whatever
method they currently use. No need for migration.

As such, I think we can simply state that tri-state migration path
solution can be discarded for the time being.

Solution 2:
Obviously, this is the way to go. We will have a dabase attribute that
states whether things are lower or upper case there.

Solution 3:
(unrelated to the above)
There seems to be some ambiguity about how to handle the translation.
Such a tool seems to require guessing which identifiers are accessed
quoted, unquoted, or both. The last option, of course, will never work.

We may need such a tool, for some projects may wish to transform from
one way to the other. It seems to me, however, that such a tool can wait
a little.

Open issues:
1. What do we do with identifiers in "template1" upon database creation?
2. How do we handle queries to tables belonging the catalog that are
shared between databases?

Observation: on a lowercase folding DB, any identifier that is not
composed only of lowercase characters MUST can be automatically assumed
to be accessed only through quoted mode.

I therefor suggest the following path to a solution:
1. CreateDB will be able to create databases from either type.
2. template1 will be defined to be one or the other. For the sake of
this discussion, let's assume it's lowercase (current situation)
3. CreateDB, upon being asked to create a new DB that has uppercase
folding, will copy over template1, as it currently does.
4. While copying, it will check each identifier. If the identifier is
not lowercase only, it is safe to copy it verbatim.
5. If the identifier is lowercase only, convert it to uppercase only. I
am assuming here that the authors of the client code chose an
uppercase-folding database, so they should know what they are doing when
accessing stuff from the standard offering.
6. I'm not sure what are the shared tables from the catalog. I don't
think it so unreasonable to ask anyone doing catalog work to assume that
catalog entries are case-sensitive. As such, maybe it's best to just
leave the data as is.
7. Column headers, however, will have to have a solution. A point still
open in current design.

I'm hoping this summary helps in furthering the discussion.

Shachar Shemesh
Lingnu Open Source Consulting


Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2004-04-25 15:11:33 Re: Bringing PostgreSQL torwards the standard regarding
Previous Message Peter Eisentraut 2004-04-25 14:22:28 Re: [HACKERS] What can we learn from MySQL?