Re: [HACKERS] 'a' == 'a '

From: Chris Travers <chris(at)travelamericas(dot)com>
To: josh(at)agliodbs(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Dann Corbit <DCorbit(at)connx(dot)com>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Terry Fielder <terry(at)ashtonwoodshomes(dot)com>, Tino Wildenhain <tino(at)wildenhain(dot)de>, "Marc G(dot) Fournier" <scrappy(at)postgresql(dot)org>, Richard_D_Levine(at)raytheon(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] 'a' == 'a '
Date: 2005-10-20 02:07:21
Message-ID: 4356FBD9.7060602@travelamericas.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Josh Berkus wrote:

>Dann,
>
>
>
>>I think that whatever is done ought to be whatever the standard says.
>>If I misinterpret the standard and PostgreSQL is doing it right, then
>>that is fine. It is just that PostgreSQL is very counter-intuitive
>>compared to other database systems that I have used in this one
>>particular area. When I read the standard, it looked to me like
>>PostgreSQL was not performing correctly. It is not unlikely that I read
>>it wrong.
>>
>>
>
>AFAIT, the standard says "implementation-specific". So we're standard.
>
>The main cost for comparing trimmed values is performance; factoring an
>rtrim into every comparison will add significant overhead to the already
>CPU-locked process of, for example, creating indexes. We're looking for
>ways to make the comparison operators lighter-weight, not heavier.
>
>
If I understand the spec correctly, it seems to indicate that this is
specific to the locale/character set. Assuming that the standard
doesn't have anything to do with any character sets, it should be
possible to make this available for those who want it as an initdb
option. Whether or not this is important enough to offer or not is
another matter.

Personally my questions are:

1) How many people have been bitten by this badly?
2) How many people have been bitten by joins that depend on padding?

Personally, unlike case folding, this seems to be an area where a bit of
documentation (i.e. all collation sets have are assumed to have the NO
PAD option in the SQL standard) would be sufficient to answer to
questions of standards-compliance.

>My general perspective on this is that if trailing blanks are a significant
>hazard for your application, then trim them on data input. That requires
>a *lot* less peformance overhead than doing it every time you compare
>something.
>
>
In general I agree. But I am not willing to jump to the conclusion that
it will never be warranted to add this as an initdb option. I am more
interested in what cases people see where this would be required. But I
agree that the bar is much higher than it is in many other cases.

Best Wishes,
Chris Travers
Metatron Technology Consulting

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message J.Kuwamura 2005-10-20 02:07:35 'a ' = 'a ' by MySQL(Re: [pgsql-advocacy] Oracle buys Innobase)
Previous Message Chris Travers 2005-10-20 01:49:24 Re: [pgsql-advocacy] Oracle buys Innobase

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-10-20 02:49:42 Re: [HACKERS] Call for translators
Previous Message Joshua D. Drake 2005-10-20 00:43:26 RSS feeds of CVS revision logs