Re: sortsupport for text

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: sortsupport for text
Date: 2012-06-20 14:10:44
Message-ID: CAM-w4HPF29Q=z8uE18Avtkckr5xnoLeAPjE1Tfn7p+Q0_EgbdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The trick for hashing such datatypes is to be able to guarantee that
> "equal" values hash to the same hash code, which is typically possible
> as long as you know the equality rules well enough.  We could possibly
> do that for text with pure-strcoll equality if we knew all the details
> of what strcoll would consider "equal", but we do not.

It occurs to me that strxfrm would answer this question. If we made
the hash function hash the result of strxfrm then we could make
equality use strcoll and not fall back to strcmp.

I'm suspect in a green field that's what we would do though the cpu
cost might be enough to think hard about it. I'm not sure it's worth
considering switching though.

The cases where it matters to users incidentally is when you have a
multi-column sort order and have values that are supposed to sort
equal in the first column but print differently. Given that there
seems to be some controversy in the locale definitions -- most locals
seem to use "insignificant" factors like accents or ligatures as
tie-breakers and avoid claiming different sequences are equal even
when the language usually treats them as equivalent -- it doesn't seem
super important to maintain the property for the few locales that fall
the other way. Unless my impression is wrong and there's a good
principled reason why some locales treat nearly equivalent strings one
way and some treat them the other.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-06-20 14:19:21 Re: sortsupport for text
Previous Message Simon Riggs 2012-06-20 14:08:11 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node