Re: [v9.2] make_greater_string() does not return a string in some cases

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-22 12:49:57
Message-ID: CA+Tgmoax-SHNgHe77cJZGsqgsB+Z=n_jzQZ5h0RG1+NcWGHkBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 22, 2011 at 12:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm a bit perplexed as to why we can't find a non-stochastic way of doing this.
>
> [ collations suck ]

Ugh.

> Now, having said that, I'm starting to wonder again why it's worth our
> trouble to fool with encoding-specific incrementers.  The exactness of
> the estimates seems unlikely to be improved very much by doing this.

Well, so the problem is that the frequency with which the algorithm
fails altogether seems to be disturbingly high for certain kinds of
characters. I agree it might not be that important to get the
absolutely best next string, but it does seem important not to fail
outright. Kyotaro Horiguchi gives the example of UTF-8 characters
ending with 0xbf.

>>>> One random idea I have is - instead of generating > and < clauses,
>>>> could we define a "prefix match" operator - i.e. a ### b iff substr(a,
>>>> 1, length(b)) = b?  We'd need to do something about the selectivity,
>>>> but I don't see why that would be a problem.
>
>>> The problem is that you'd need to make that a btree-indexable operator.
>
>> Well, right.  Without that, there's not much point.  But do you think
>> that's prohibitively difficult?
>
> The problem is that you'd just be shifting all these same issues into
> the btree index machinery, which is not any better equipped to cope with
> them, and would not be a good place to be adding overhead.

My thought was that it would avoid the need to do any character
incrementing at all. You could just start scanning forward as if the
operator were >= and then stop when you hit the first string that
doesn't have the same initial substring.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Greg Stark 2011-09-22 12:59:16 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Kyotaro HORIGUCHI 2011-09-22 11:30:29 Re: [v9.2] make_greater_string() does not return a string in some cases

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-09-22 12:59:16 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Fujii Masao 2011-09-22 12:13:43 Re: Online base backup from the hot-standby