Re: psql completion for ids in multibyte string

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: psql completion for ids in multibyte string
Date: 2016-03-03 03:45:41
Message-ID: CAEepm=04nz=TEg16-pB79JLEvTKE8AqbaV37ZM7LXqANUZw43A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 3, 2016 at 2:07 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, thank you for the comments.
>
> At Wed, 2 Mar 2016 10:10:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2JjPY-v1JWWrJyBone-=t1A7TJRyKSfaseQLzh5sMQGg(at)mail(dot)gmail(dot)com>
>> On Wed, Mar 2, 2016 at 7:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> > On Mon, Feb 29, 2016 at 6:32 PM, Thomas Munro
>> > <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> >> On Fri, Feb 26, 2016 at 6:34 PM, Kyotaro HORIGUCHI
>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >>> Hello, this is the second patch plitted out. This allows
>> >>> multibyte names to be completed in psql.
>> >>>
>> >>> At Fri, 06 Nov 2015 11:47:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151106(dot)114717(dot)170526268(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> >>>> At Thu, 5 Nov 2015 18:32:59 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <563B224B(dot)3020400(at)lab(dot)ntt(dot)co(dot)jp>
>> >>>> > On 2015/11/05 18:10, Kyotaro HORIGUCHI wrote:
>> >>>> > > Hello. I don't know whether this is a bug fix or improvement,
>> >>>> >
>> >>>> > Would it be 50-50? :-)
>> >>>>
>> >>>> Yeah, honestly saying, I doubt that this is valuable but feel
>> >>>> uneasy to see some of the candidates vanish as completon proceeds
>> >>>> for no persuasive reason.
>> >>
>> >> +1 from me, it's entirely reasonable to want to name database objects
>> >> in any human language and use auto-completion. It's not working today
>> >> as you showed.
>> >>
>> >>> The current version of tab-completion failed to complete names
>> >>> with multibyte characters.
>> >>>
>> >>> =# create table いろは (あ int);
>> >>> =# create table いこい (あ int);
>> >>> =# drop table <tab>
>> >>> "いろは" hint_plan. pg_toast.
>> >>> "いこい" information_schema. pg_toast_temp_1.
>> >>> ALL IN TABLESPACE pg_catalog. public.
>> >>> dbms_stats. pg_temp_1.
>> >>> postgres=# alter table "い
>> >>> =# drop table "い<tab>
>> >>> =# drop table "い /* No candidate offered */
>> >>>
>> >>> This is because _complet_from_query counts the string length in
>> >>> bytes, instead of characters. With this patch the candidates will
>> >>> appear.
>> >>>
>> >>> =# drop table "い<tab>
>> >>> "いこい" "いろは"
>> >>> postgres=# drpo table "い
>> >>
>> >> The patch looks correct to me: it counts characters rather than bytes,
>> >> which is the right thing to do because the value is passed to substr()
>> >> which also works in characters rather than bytes. I tested with
>> >> "éclair", and without the patch, tab completion doesn't work if you
>> >> press tab after 'é'. With the patch it does.
>> >
>> > OK, but I am a little concerned about this code:
>> >
>> > /* Find something that matches */
>> > if (result && PQresultStatus(result) == PGRES_TUPLES_OK)
>> > {
>> > const char *item;
>> >
>> > while (list_index < PQntuples(result) &&
>> > (item = PQgetvalue(result, list_index++, 0)))
>> > if (pg_strncasecmp(text, item, string_length) == 0)
>> > return pg_strdup(item);
>> > }
>> >
>> > Every other use of string_length in this function is using it as the
>> > argument to the SQL substring function, which cares about characters,
>> > not bytes. But this use seems to care about bytes, not characters.
>> >
>> > Am I wrong?
>>
>> Ugh, and the other problem is that string_length is always 0 there if
>> state isn't 0 (in master it is static so that the value is reused for
>> subsequent calls, but this patch made it automatic).
>
> Thanks for pointing it out.
>
>> I think we should leave string_length as it is and use a new variable
>> for character-based length, as in the attached.
>
> Basically agreed but I like byte_length for the previous
> string_length and string_length for string_length_cars. Also
> text_length is renamed in the attached patch.
>
> What do you think about this?

It seems fine this way too. Maybe you should also rename
string_length to byte_length in create_or_drop_command_generator, just
for consistency.

One peculiarity I noticed when testing this on MacOSX 10.10.2 with
Apple-supplied libedit is that the following sequence does something
strange:

CREATE TABLE いこい ();
CREATE TABLE いこいこ ();
CREATE TABLE いろは ();
SELECT * FROM "い<tab>
-> the line magically changes to:
SELECT * FROM <cursor>

If you press tab at any other point in any of those Japanese strings,
it works correctly. I couldn't make anything similar happen with
Latin multibyte characters, and it doesn't happen on Debian with
libreadline, or on MacOSX with libreadline (installed from MacPorts).
I suspect this is a problem with Apple libedit and not with psql or
your patch, but I didn't have time to investigate further.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2016-03-03 03:53:44 Re: WIP: Upper planner pathification
Previous Message Oleg Bartunov 2016-03-03 03:32:26 Re: The plan for FDW-based sharding