Re: [v9.2] make_greater_string() does not return a string in some cases

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marcin Mańk <marcin(dot)mank(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-26 14:30:32
Message-ID: 1317047432.1759.27.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On mån, 2011-09-26 at 10:08 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > On fre, 2011-09-23 at 20:35 +0300, Marcin Mańk wrote:
> >> One idea:
> >> col like 'foo%' could be translated to col >= 'foo' and col <= foo || 'zzz' , where 'z' is the largest possible character. This should be good enough for calculating stats.
> >> How to find such a character, i do not know.
>
> > That's what makes this so difficult.
>
> > If we knew the largest character, we could probably also find the
> > largest-1, largest-2, etc. characters and determine the total order of
> > everything.
>
> No, it's a hundred times worse than that, because in collations other
> than C there typically *is* no total order. The collation behavior of
> many characters is context-sensitive, thanks to the multi-pass behavior
> of typical "dictionary" algorithms.

Well, there is a total order of all strings, but it's not consistent
under string concatenation.

But there is a "largest character". If the collation implementation
uses four weights (the typical case), the largest character is the one
that maps to <FFFF> <FFFF> <FFFF> <FFFF>. If you appended that
character to a string, you would get a larger string. (Unless there are
French backwards levels or other funny things in place, perhaps.) But
we don't know which character that is, and likely there isn't one, so
we'd need to largest character that maps to an actually assigned weight,
and that's not possible without exhaustive search of all collating
elements.

We could possibly try to make this whole thing work differently by
storing the strxfrm results in the histograms.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2011-09-26 14:33:03 Re: BUG #6213: COPY does not work as expected in a plpgsql function
Previous Message Tom Lane 2011-09-26 14:08:54 Re: [v9.2] make_greater_string() does not return a string in some cases

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2011-09-26 14:33:50 Re: Support UTF-8 files with BOM in COPY FROM
Previous Message Tom Lane 2011-09-26 14:28:20 Re: Upgrading Extenions from 8.4