Re: [v9.2] make_greater_string() does not return a string in some cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-10-29 20:36:06
Message-ID: 29562.1319920566@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Ummm ... why do the incrementer functions think they need to restore the
>> previous value on failure? AFAICS that's a waste of code and cycles,
>> since there is only one caller and it doesn't care in the least.

> Well, it might not be strictly necessary for pg_utf8_increment() and
> pg_eucjp_increment(), but it's clearly necessary for the generic
> incrementer function for exactly the same reason it was needed in the
> old coding. I suppose we could weaken the rule to "you must leave a
> valid character behind rather than a bunch of bytes that doesn't
> encode to a character", but the cycle savings are negligible and the
> current rule seems both simpler and more bullet-proof.

No, it's *not* necessary any more, AFAICS. make_greater_string discards
the data and overwrites it with a null instantly upon getting a failure
return from the incrementer. The reason we used to need it was that we
did pg_mbcliplen after failing to increment, but now we do that before
we ever increment anything, so we already know the length of the last
character. It doesn't matter whether those bytes are still valid or
contain garbage.

>> I'm also quite distressed that you ignored my advice to limit the number
>> of combinations tried. This patch could be horribly slow when dealing
>> with wide characters, eg think what will happen when starting from
>> U+10000.

> Uh, I think it will try at most one character in that position and
> then truncate away that character entirely, per my last email on this
> topic (to which you never responded):
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg01195.php

Oh! You are right, I was expecting it to try multiple characters at the
same position before truncating the string. This change seems to have
lobotomized things rather thoroughly. What is the rationale for that?
As an example, when dealing with a single-character string, it will fail
altogether if the next code value sorts out-of-order, so this seems to
me to be a rather large step backwards.

I think we ought to go back to the previous design of incrementing till
failure and then truncating, which puts the onus on the incrementer to
make a reasonable tradeoff of how many combinations to try per character
position. There's a simple tweak we could make to the patch to limit
that: once we've maxed out a lower-order byte of a multibyte char,
*don't reset it to minimum*, just move on to incrementing the next
higher byte. This preserves the old property that the maximum number of
combinations tried is bounded by 256 * string's length in bytes.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2011-10-30 00:04:09 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Robert Haas 2011-10-29 20:15:16 Re: [v9.2] make_greater_string() does not return a string in some cases

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2011-10-29 21:34:49 Re: pgsql_fdw, FDW for PostgreSQL server
Previous Message Mr. Aaron W. Swenson 2011-10-29 20:28:57 Re: Add socket dir to pg_config..?