Re: [v9.2] make_greater_string() does not return a string in some cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-10-30 14:58:53
Message-ID: 21986.1319986733@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Oh! You are right, I was expecting it to try multiple characters at the
>> same position before truncating the string. This change seems to have
>> lobotomized things rather thoroughly. What is the rationale for that?
>> As an example, when dealing with a single-character string, it will fail
>> altogether if the next code value sorts out-of-order, so this seems to
>> me to be a rather large step backwards.

> On this point I believe you are still confused. The old code tried
> one character per position, and the new code tries one character per
> position. Nothing has been lobotomized in any way.

No, on this point you are flat out wrong. Try something like

select ... where f1 like 'p%';

in tt_RU locale, wherein 'q' sorts between 'k' and 'l'. The old code
correctly found that 'r' works as a string greater than 'p'. The new
code fails to find a greater string, because it only tries 'q' and then
gives up. This results in a selectivity estimate much poorer than
necessary.

Since the stated purpose of this patch is to fix some corner cases
where the code fails to find a greater string, I fail to see why it's
acceptable to introduce some other corner cases that weren't there
before.

> The difference is
> that the old code used a "guess and check" approach to generate the
> character, so there was an inner loop that was trying to generate a
> character (possibly generating various garbage strings that did not
> represent a character along the way) and then, upon success, checked
> the sort order of that single string before truncating and retrying.

You are misreading the old code. The inner loop increments the last
byte, checks for success, and if it hasn't produced a greater string
then it loops around to increment again.

> The fact that we haven't gotten any complaints before suggests that this
> actually works decently well as it stands.

Well, that's true of the old algorithm ;-)

I had likewise thought of the idea of trying some fixed number of
character values at each position, but it's unclear to me why that's
better than allowing an encoding-specific behavior. I don't believe
that we could get away with trying less than a few dozen values, though.
For example, in a situation where case sensitivity is relevant, you
might need to increment past all the upper-case letters to get to a
suitable lower-case letter. I also think that it's probably useful to
try incrementing higher-order bytes of a multibyte character before
giving up --- we just can't afford to do an exhaustive search.
Thus my proposal to let the low-order bytes max out but not cycle.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2011-10-30 18:49:40 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Alexander Lakhin 2011-10-30 03:46:55 Re: BUG #6277: Money datatype conversion wrong with Russian locale

Browse pgsql-hackers by date

  From Date Subject
Next Message Eric B. Ridge 2011-10-30 17:51:26 Re: Thoughts on "SELECT * EXCLUDING (...) FROM ..."?
Previous Message Martijn van Oosterhout 2011-10-30 14:27:08 Re: Add socket dir to pg_config..?