Re: WIP patch: Collation support

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, Radek Strnad <radek(dot)strnad(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: Collation support
Date: 2008-09-19 21:31:52
Message-ID: 26236.1221859912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Here's an updated version of the stripped-down patch, now with
> documentation changes, plus a couple of minor bug fixes.

> I think this is looking pretty good now, and I would appreciate review
> from others before I go ahead committing this.

I guess the $64 question is whether this moves us towards where we want to
be? If the ultimate goal is to get to per-column collations, I think this
patch is likely to be counterproductive, because it will establish syntax
and semantics that we'll have to try to remain compatible with. If we're
going to be satisfied with per-database collations, then great, let's do
it.

As far as reviewing the patch itself goes:

* It's missing pg_dump (or I guess really pg_dumpall) support.

* Please call the new columns datcollate and datctype. Just because
someone ignored the naming convention years ago for pg_database.encoding
doesn't mean we should ignore it here. Also, if you're going to name one
column based on an LC_foo variable's name, do the other one the same way.
I note also that this would match the keywords used in CREATE DATABASE.

* Don't even think of adding system catalog columns without updating
catalogs.sgml.

* You should try to get rid of LOCALE_NAME_BUFLEN altogether. Definitely
the comment about it in pg_control.h is now obsolete.

* You *must* bump PG_CONTROL_VERSION when you change its layout.
(Don't forget catversion too, since you're also changing pg_database)

* This doc sentence reads a bit awkwardly:

An important restriction, however, is that each database character set
must be compatible with the database's <envar>LC_CTYPE</> setting.

Maybe

An important restriction, however, is that each database's character set
must be compatible with the database's <envar>LC_CTYPE</> setting.

Also I wonder whether we shouldn't say that it must be compatible with
LC_CTYPE *and* LC_COLLATE.

* This bit in the CREATE DATABASE ref page is also awkward:

Any character set encoding specified for the new database must be
compatible with the chosen COLLATE and CTYPE settings.

Maybe

The character set encoding used for the new database must be
compatible with the chosen COLLATE and CTYPE settings.

* This bit doesn't look "ready to commit":

***************
*** 754,759 ****
--- 749,756 ----
<option>-l</option> option or the <command>\l</command> command
of <command>psql</command>.

+ XXX It would be nice to show the "korean" database here. And we should
+ modify psql to show locale as well
<screen>
$ <userinput>psql -l</userinput>
List of databases

* This makes sense, but then shouldn't we make the identical restriction
for encoding?

+ The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+ those of the template database, except when template0 is used as
+ template. This is because <literal>COLLATE</> and <literal>CTYPE</>

* I can't tell whether the writer of this bit thought that a blank line
would come out as a paragraph break or not. Either add <para>'s or
remove the blank lines to make it look like one para in the source:

--- 76,105 ----

<para>
<command>initdb</command> initializes the database cluster's default
! locale and character set encoding.
!
! The character set encoding, collation order (<literal>LC_COLLATE</>)
! and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower,
! digit) can be set separately for a database when it is created.
! <command>initdb</command> determines those settings for the
! <literal>template1</literal> database, which will serve as the
! default for all other databases.
!
! To alter the default collation order or character set classes, use the
! <option>--lc-collate</option> and <option>--lc-ctype</option> options.
! Collation orders other than <literal>C</> or <literal>POSIX</> also have
! a performance penalty. For these reasons it is important to choose the
! right locale when running <command>initdb</command>.
!
! The remaining locale categories can be changed later when the server
! is started. You can also use <option>--locale</option> to set the
! default for all locale categories, including collation order and
! character set classes. All server locale values (<literal>lc_*</>) can
! be displayed via <command>SHOW ALL</>.
More details can be found in <xref linkend="locale">.

! To alter the default encoding, use the <option>--encoding</option>.
! More details can be found in <xref linkend="multibyte">.
</para>

* This bit for resetxlog is just wrong now:

--- 62,70 ----
by specifying the <literal>-f</> (force) switch. In this case plausible
values will be substituted for the missing data. Most of the fields can be
expected to match, but manual assistance might be needed for the next OID,
! next transaction ID and epoch, next multitransaction ID and offset, and
! WAL starting address fields.
! The first five of these can be set using the switches discussed below.
If you are not able to determine correct values for all these fields,
<literal>-f</> can still be used, but
the recovered database must be treated with even more suspicion than

I guess it should just say "These fields can be set using the switches
discussed below.", since there aren't any that can't be.

* This is no good:

+ SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE);
+ SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE);

The source *must* be PGC_S_OVERRIDE to ensure it can't be overridden.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2008-09-19 21:33:34 Re: Assert Levels
Previous Message Joshua Drake 2008-09-19 21:09:42 Re: PostgreSQL future ideas