Re: sortsupport for text

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sortsupport for text
Date: 2012-06-20 14:55:29
Message-ID: 26017.1340204129@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <peter(at)2ndquadrant(dot)com> writes:
> I think that this change may have made the difference between the
> Hungarians getting away with it and not getting away with it. Might it
> have been that for text, they were using some operator that wasn't '='
> (perhaps one which has no fastpath, and thus correctly made a
> representation about equivalency) rather than texteq prior to this
> commit?

Uh, no. There aren't any magic variants of equality now, and there were
not back then either. I'm inclined to think that the "Hungarian
problem" did exist long before we fixed it.

> So, you're going to have an extra strcoll()/strxfrm() + strcmp() here,
> as part of a "not-equal-but-maybe-equivalent" test, which is bad.
> However, if that means that we can cache a text constant as a
> strxfrm() blob, and compare in a strxfrm()-wise fashion, that will
> more than pay for itself, even for btree traversal alone.

Um ... are you proposing to replace text btree index entries with
strxfrm values? Because if you aren't, I don't see that this is likely
to win anything. And if you are, it loses on the grounds of (a) index
bloat and (b) loss of ability to do index-only scans.

> It would be nice to hear what others thought of these ideas before I
> actually start writing a patch that both fixes these problems (our
> behaviour is incorrect for some locales according to the Unicode
> standard), facilitates a strxfrm() optimisation, and actually adds a
> strxfrm() optimisation.

Personally I think this is not a direction we want to go in. I think
that it's going to end up a significant performance loss in many cases,
break backwards compatibility in numerous ways, and provide a useful
behavioral improvement to only a small minority of users. If you check
the archives, we get far more complaints from people who think their
locale definition is wrong than from people who are unhappy because
we don't hew exactly to the corner cases of their locale.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2012-06-20 15:02:54 Nasty, propagating POLA violation in COPY CSV HEADER
Previous Message Matheus Ricardo Espanhol 2012-06-20 14:53:00 Re: [HACKERS] PANIC while doing failover (streaming replication)