Re: [SQL] Comparison semantics of CHAR data type

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Fanghaenel <tfanghaenel(at)salesforce(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [SQL] Comparison semantics of CHAR data type
Date: 2013-10-11 20:58:50
Message-ID: 1381525130.59803.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Thomas Fanghaenel wrote:

>> I was wondering about the proper semantics of CHAR comparisons in some corner
>> cases that involve control characters with values that are less than 0x20
>> (space).

What matters in general isn't where the characters fall when
comparing individual bytes, but how the strings containing them
sort according to the applicable collation.  That said, my
recollection of the spec is that when two CHAR(n) values are
compared, the shorter should be blank-padded before making the
comparison.  *That* said, I think the general advice is to stay
away from CHAR(n) in favor or VARCHAR(n) or TEXT, and I think that
is good advice.

> I am sorry for this long email, but I would be interested to see what
> other hackers think about this issue.

Since we only have the CHAR(n) type to improve compliance with the
SQL specification, and we don't generally encourage its use, I
think we should fix any non-compliant behavior.  That seems to mean
that if you take two CHAR values and compare them, it should give
the same result as comparing the same two values as VARCHAR using
the same collation with the shorter value padded with spaces.

So this is correct:

test=# select 'ab'::char(3) collate "en_US" < E'ab\n'::char(3) collate "en_US";
 ?column?
----------
 t
(1 row)

... because it matches:

test=# select 'ab '::varchar(3) collate "en_US" < E'ab\n'::varchar(3) collate "en_US";
 ?column?
----------
 t
(1 row)

But this is incorrect:

test=# select 'ab'::char(3) collate "C" < E'ab\n'::char(3) collate "C";
 ?column?
----------
 t
(1 row)

... because it doesn't match:

test=# select 'ab '::varchar(3) collate "C" < E'ab\n'::varchar(3) collate "C";
 ?column?
----------
 f
(1 row)

Of course, I have no skin in the game, because it took me about two
weeks after my first time converting a database with CHAR columns
to PostgreSQL to change them all to VARCHAR, and do that as part of
all future conversions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-10-11 21:01:05 Re: [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation
Previous Message Josh Berkus 2013-10-11 20:23:07 Re: Auto-tuning work_mem and maintenance_work_mem

Browse pgsql-sql by date

  From Date Subject
Next Message Thomas Fanghaenel 2013-10-14 17:57:37 Re: [SQL] Comparison semantics of CHAR data type
Previous Message Bruce Momjian 2013-10-11 19:44:37 Re: [SQL] Comparison semantics of CHAR data type