Re: [v9.2] make_greater_string() does not return a string in some cases

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-22 03:04:02
Message-ID: CA+TgmoY9+ZUHNju7ERi7QM36FjiXzfT38S63564L_14qf3hgYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Sep 21, 2011 at 9:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The main risk that I foresee with the proposed approach is that if you
> have, say, a 4-byte final character, you could iterate through a *whole
> lot* (millions) of larger encoded characters, with absolutely no hope of
> making a greater string that way when the determining difference occurs
> at some earlier character.  And then when you do run out, you could
> waste just as much time at the immediately preceding character, etc etc.
> The existing algorithm is a compromise between thoroughness of
> investigation of different values of the last character versus speed of
> falling back to change the preceding character instead.  I'd be the
> first to say that incrementing only the last byte is a very
> quick-and-dirty heuristic for making that happen.  But I think it would
> be unwise to allow the thing to consider more than a few hundred values
> for a character position before giving up.  Possibly the
> encoding-specific incrementer could be designed to run through all legal
> values of the last byte, then start skipping larger and larger ranges
> --- maybe just move directly to incrementing the first byte.

I'm a bit perplexed as to why we can't find a non-stochastic way of doing this.

> Aside from that issue, the submitted patch seems to need quite a lot of
> detail work; for instance, I noted an unconstrained memcpy into a 4-byte
> local buffer, as well as lots and lots of violations of PG house style.
> That's certainly all fixable but somebody will have to go through it.

I noticed that, too, but figured we should agree on the basic approach
first. In the first instance, "someone" will hopefully be the patch
author. (Help from anyone else is also welcome, of course... we have
almost 40 patches left to crawl through and, at the moment, very few
reviewers.)

>> One random idea I have is - instead of generating > and < clauses,
>> could we define a "prefix match" operator - i.e. a ### b iff substr(a,
>> 1, length(b)) = b?  We'd need to do something about the selectivity,
>> but I don't see why that would be a problem.
>
> The problem is that you'd need to make that a btree-indexable operator.

Well, right. Without that, there's not much point. But do you think
that's prohibitively difficult?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2011-09-22 04:24:04 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Tom Lane 2011-09-22 01:49:27 Re: [v9.2] make_greater_string() does not return a string in some cases

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2011-09-22 03:18:26 Re: citext operator precedence fix
Previous Message Tom Lane 2011-09-22 01:49:27 Re: [v9.2] make_greater_string() does not return a string in some cases