Re: String comparison and the SQL standard

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Tom Lane *EXTERN*" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: String comparison and the SQL standard
Date: 2013-01-18 10:03:29
Message-ID: A737B7A37273E048B164557ADEF4A58B0579B0AD@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
>> b) If the length in characters of X is not equal to the length
>> in characters of Y, then the shorter string is effectively
>> replaced, for the purposes of comparison, with a copy of itself
>> that has been extended to the length of the longer string by
>> concatenation on the right of one or more pad characters,
>> where the pad character is chosen based on CS.
>> If CS has the NO PAD characteristic, then the pad character is
>> an implementation-dependent character different from
>> any character in the character set of X and Y that collates
>> less than any string under CS.
>> Otherwise, the pad character is a <space>.
>
> The PAD case is specifying the way that CHAR(n) comparison should work.
> (We don't expose the PAD/NO PAD distinction in any other way than
> CHAR vs VARCHAR/TEXT types.)
>
> AFAICS, the NO PAD case is ignorable BS: [...]

> In any case, the most significant word in that whole paragraph is
> "effectively", which means you can do it however you want as long
> as you get an equivalent comparison result.
>
>> That would effectively mean that 'a'='a ' is TRUE for
>> all character string types.
>
> In the PAD case, yes. Else no.

Thanks for the clarification.

>> Of the DBMS I tested, Microsoft SQL Server and MySQL gave me
>> that very result, while PostgreSQL and Oracle gave me FALSE.
>
> This probably has more to do with what these systems think the
> data type of an undecorated literal is, than with whether they do
> trailing-space-insensitive comparison all the time.

I tested not only with string literals, but also comparing
table columns of the respective types.

I came up with the following table of semantics used for
comparisons:

| CHAR(n)=CHAR(n) | VARCHAR(n)=VARCHAR(n) | CHAR(n)=VARCHAR(n) |
-----------+-----------------+-----------------------+--------------------+
Oracle | PAD SPACE | NO PAD | NO PAD |
-----------+-----------------+-----------------------+--------------------+
PostgreSQL | PAD SPACE | NO PAD | PAD SPACE |
-----------+-----------------+-----------------------+--------------------+
MySQL | PAD SPACE | PAD SPACE | PAD SPACE |
-----------+-----------------+-----------------------+--------------------+
SQL Server | PAD SPACE | PAD SPACE | PAD SPACE |
-----------+-----------------+-----------------------+--------------------+

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message salah jubeh 2013-01-18 11:05:15 Re: SELECT DISTINCT
Previous Message Jasen Betts 2013-01-18 10:00:18 Re: Case insensitive collation