Re: Why format() adds double quote?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why format() adds double quote?
Date: 2016-01-20 09:20:03
Message-ID: CAFj8pRD7-Ko2E0cuj-PMYJN7Pmcqud1X-0RCBJh1h0oY30F2Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2016-01-20 10:17 GMT+01:00 Tatsuo Ishii <ishii(at)postgresql(dot)org>:

> > Hi
> >
> > 2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <ishii(at)postgresql(dot)org>:
> >
> >> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <ishii(at)postgresql(dot)org>:
> >> >
> >> >> test=# select format('%I', t) from t1;
> >> >> format
> >> >> ----------
> >> >> aaa
> >> >> "AAA"
> >> >> "あいう"
> >> >> (3 rows)
> >> >>
> >> >> Why is the text value of the third line needed to be double quoted?
> >> >> (note that it is a multi byte character). Same thing can be said to
> >> >> quote_ident().
> >> >>
> >> >> We treat identifiers made of the multi byte characters without double
> >> >> quotation (non delimited identifier) in other places.
> >> >>
> >> >> test=# create table t2(あいう text);
> >> >> CREATE TABLE
> >> >> test=# insert into t2 values('aaa');
> >> >> INSERT 0 1
> >> >> test=# select あいう from t2;
> >> >> あいう
> >> >> --------
> >> >> aaa
> >> >> (1 row)
> >> >
> >> > format uses same routine as quote_ident. So quote_ident should be
> fixed
> >> > first.
> >>
> >> Yes, I had that in my mind too.
> >>
> >> Attached is the proposed patch to fix the bug.
> >> Regression tests passed.
> >>
> >> Here is an example after the patch. Note that the third row is not
> >> quoted any more.
> >>
> >> test=# select format('%I', あいう) from t2;
> >> format
> >> --------
> >> aaa
> >> "AAA"
> >> あああ
> >> (3 rows)
> >>
> >> Best regards,
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese:http://www.sraoss.co.jp
> >>
> >> diff --git a/src/backend/utils/adt/ruleutils.c
> >> b/src/backend/utils/adt/ruleutils.c
> >> index 3783e97..b93fc27 100644
> >> --- a/src/backend/utils/adt/ruleutils.c
> >> +++ b/src/backend/utils/adt/ruleutils.c
> >> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident)
> >> * would like to use <ctype.h> macros here, but they might yield
> >> unwanted
> >> * locale-specific results...
> >> */
> >> - safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] ==
> '_');
> >> + safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_'
> ||
> >> IS_HIGHBIT_SET(ident[0]));
> >>
> >> for (ptr = ident; *ptr; ptr++)
> >> {
> >> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident)
> >>
> >> if ((ch >= 'a' && ch <= 'z') ||
> >> (ch >= '0' && ch <= '9') ||
> >> - (ch == '_'))
> >> + (ch == '_') ||
> >> + (IS_HIGHBIT_SET(ch)))
> >> {
> >> /* okay */
> >> }
> >>
> >>
> > This patch ls simply - I remember I was surprised, so we allow any
> > multibyte char few months ago.
> >
> > +1
>
> If we would go this way, question is if we should back patch this or
> not since the patch apparently changes the existing
> behaviors. Comments? I would think we should not.
>

I am sure, so we should not backport this change. This can breaks customer
regress tests - and the current behave isn't 100% correct, but it is safe.

Pavel

>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2016-01-20 09:28:38 Batch update of indexes
Previous Message Tatsuo Ishii 2016-01-20 09:17:15 Re: Why format() adds double quote?