Re: Syntax rules for a text value inside the literal for a user-defined type—doc section “8.16.2. Constructing Composite Values”

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers list <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Syntax rules for a text value inside the literal for a user-defined type—doc section “8.16.2. Constructing Composite Values”
Date: 2020-04-03 03:46:37
Message-ID: 0633582A-1EA3-49C6-9E09-0198CA9BA757@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02-Apr-2020, at 19:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

Bryn Llewellyn <bryn(at)yugabyte(dot)com> writes:
> The documentation in section “8.16.2. Constructing Composite Values” here:
> https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-IO-SYNTAX <https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-IO-SYNTAX>

The authoritative documentation for that is at 8.16.6 "Composite Type
Input and Output Syntax", and it says quite clearly that whitespace is
not ignored (except for before and after the outer parentheses).

regards, tom lane

Thanks for the super-fast reply, Tom. Yes, I had read that part. I should have said this:

“The documentation in section “8.16.2. Constructing Composite Values” et seq shows examples…”

This is the text to which you refer me:

“Whitespace outside the parentheses is ignored, but within the parentheses it is considered part of the field value, and might or might not be significant depending on the input conversion rules for the field data type. For example, in:

'( 42)'

the whitespace will be ignored if the field type is integer, but not if it is text. As shown previously, when writing a composite value you can write double quotes around any individual field value.”

Notice the wording “double quotes around any individual field value.” The word “around” was the source of my confusion. For the docs to communicate what, it seems, they ought to, then the word should be “within”. This demonstrates my point:

create type rt as (a text, b text);
with v as (select '(a "b c" d, e "f,g" h)'::rt as r)
select
'>'||(r).a||'<' as a,
'>'||(r).b||'<' as b
from v;

It shows this:

a | b
-----------+------------
>a b c d< | > e f,g h<

So these are the resulting parsed-out text values:

a b c d

and

e f,g h

This demonstrates that, in my input, the double quotes are *within* each of the two text values—and definitely *do not surround* them.

I really would appreciate a reply to the second part of my earlier question:

“please explain the rationale for what seems to me to be a counter-intuitive syntax design choice.”

I say “counter-intuitive” because JSON had to solve the same high-level goal—to distinguish between a string value on the one hand and, for example, a number or boolean value on the other hand. They chose the rule that a string value *must* be surrounded by double quotes and that other values must *not* be so surrounded. The JSON model resonates with my intuition. It also has mechanisms to escape interior double quotes and other special characters. I am very interested to know why the PostgreSQL designers preferred their model.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-04-03 03:47:54 Re: WAL usage calculation patch
Previous Message Tom Lane 2020-04-03 03:40:51 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey