Re: is this a bug or I am blind?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Mage <mage(at)mage(dot)hu>, pgsql-general(at)postgreSQL(dot)org
Subject: Re: is this a bug or I am blind?
Date: 2005-12-16 18:28:59
Message-ID: 27664.1134757739@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> FWIW, here's some links to Microsoft and MySQL dealing with the same
> issue, so we're not alone here. Hungarian seems to be a complex
> language to sort, but it seems that glibc is right in this case.

The mysql bug link has a fairly detailed description, but it dodges the
question that we need to answer here: do we want to make a finer-grain
distinction than glibc does? In the test data that I got from Mage,
the first clue I got was from looking at the results of an ORDER BY
versus an index scan:

potyos
potyty
potty
potyty
potyty
potty
potty6

potyos
potty
potyty
potyty
potty
potyty
potty6

Actually, the relative order of the "potyty"s and "potty"s is completely
random at the moment. You've got to admit that this looks weird: you'd
expect a database's ORDER BY output to impose at least a cosmetic
ordering on these strings. Per what we've heard, it wouldn't matter
much to a Hungarian speaker whether the "potyty"s come before or after
the "potty"s, but it seems like it should be consistently one or the
other.

This argument doesn't really answer the question about whether
WHERE username = 'potyty' should match a stored 'potty', however.
My inclination is to say "no it shouldn't directly match --- apply a
normalization function to your data if you think that tyty should be
canonically spelled tty". If we had per-column locales there would
be a stronger argument for allowing them to be equal, but right now
this folding would occur for all text in a database ... and surely
this would be considered a bug for any text that happened not to be
Hungarian words. But perhaps my view is overly influenced by
performance considerations.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-12-16 18:40:20 Re: is this a bug or I am blind?
Previous Message Carlos Benkendorf 2005-12-16 18:28:03 Fetch statements