Quick Links

Re: Why format() adds double quote?

From:	"Daniel Verite" <daniel(at)manitou-mail(dot)org>
To:	"Tatsuo Ishii" <ishii(at)postgresql(dot)org>
Cc:	pavel(dot)stehule(at)gmail(dot)com,listas(at)guedesoft(dot)net,robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Why format() adds double quote?
Date:	2016-01-27 15:37:14
Message-ID:	e3788f38-83e5-4036-9fd7-faa6ea32b774@mm
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tatsuo Ishii wrote:

> 2) What does the SQL standard say? Do they say that non ASCII white
> spaces should be treated as ASCII white spaces?

I've used white space in the example, but I'm concerned about
punctuation too.

unicode.org has this helpful paper:
http://www.unicode.org/L2/L2000/00260-sql.pdf
which studies Unicode in SQL-99 identifiers.

The relevant BNF they extracted from the standard looks like this:

identifier body> ::=
<identifier start>
[ { <underscore> | <identifier part> }... ]

<delimited identifier body> ::= <delimited identifier part>...

========

The current version of quote_ident() plays it safe by implementing
the rule that, as soon it encounters a character outside
of US-ASCII, it surrounds the identifier with double quotes, no matter
to which category or block this character belongs.
So its output is guaranteed to be compatible with the above grammar.

The change in the patch is that multibyte characters just don't imply
quoting.

But according to the points 1 and 2 of the paper, the first character
must have the Unicode alphabetic property, and it must not
have the Unicode combining property.

I'm mostly ignorant in Unicode so I'm not sure of the precise
implications of having such Unicode properties, but still my
understanding is that the new quote_ident() ignores these rules,
so in this sense it could produce outputs that wouldn't be
compatible with SQL-99.

Also, here's what we say in the manual about non quoted identifiers:
http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html

"SQL identifiers and key words must begin with a letter (a-z, but also
letters with diacritical marks and non-Latin letters) or an underscore
(_). Subsequent characters in an identifier or key word can be
letters, underscores, digits (0-9), or dollar signs ($)"

So it explicitly allows letters in general (and also seems less
strict than SQL-99 about underscore), but it makes no promise about
Unicode punctuation or spaces, for instance, even though in practice
the parser seems to accept them just fine.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Re: Why format() adds double quote? at 2016-01-27 07:25:56 from Tatsuo Ishii

Responses

Re: Why format() adds double quote? at 2016-01-28 00:00:29 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2016-01-27 16:00:16	Re: proposal: PL/Pythonu - function ereport
Previous Message	Dilip Kumar	2016-01-27 15:27:00	Re: Patch: fix lock contention for HASHHDR.mutex