Re: Syntax rules for a text value inside the literal for a user-defined type—doc section “8.16.2. Constructing Composite Values”

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers list <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Syntax rules for a text value inside the literal for a user-defined type—doc section “8.16.2. Constructing Composite Values”
Date: 2020-04-03 07:05:58
Message-ID: CAKFQuwZmtS6CMU+dPYOkOWUMS5jdKyGztm+qP4Jg58299xOj9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2020 at 8:46 PM Bryn Llewellyn <bryn(at)yugabyte(dot)com> wrote:

> On 02-Apr-2020, at 19:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Bryn Llewellyn <bryn(at)yugabyte(dot)com> writes:
>
> The documentation in section “8.16.2. Constructing Composite Values” here:
> https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-IO-SYNTAX <
> https://www.postgresql.org/docs/11/rowtypes.html#ROWTYPES-IO-SYNTAX>
>
>
> The authoritative documentation for that is at 8.16.6 "Composite Type
> Input and Output Syntax", and it says quite clearly that whitespace is
> not ignored (except for before and after the outer parentheses).
>
[...]

>
> “Whitespace outside the parentheses is ignored, but within the parentheses
> it is considered part of the field value, and might or might not be
> significant depending on the input conversion rules for the field data
> type. For example, in:
>
> '( 42)'
>
> the whitespace will be ignored if the field type is integer, but not if it
> is text. As shown previously, when writing a composite value you can write
> double quotes around any individual field value.”
>
> Notice the wording “double quotes around any individual field value.” The
> word “around” was the source of my confusion. For the docs to communicate
> what, it seems, they ought to, then the word should be “within”. This
> demonstrates my point:
>

Actually, they do mean around (as in, the double-quotes must be adjacent to
the commas that delimit the fields, or the parens).

The next two sentences clear things up a bit:

"You must do so if the field value would otherwise confuse the
composite-value parser. In particular, fields containing parentheses,
commas, double quotes, or backslashes must be double-quoted."

That said the documentation doesn't match the behavior (which is
considerably more forgiving and also willing to simply discard
double-quotes instead of error-ing out when the documented rules are not
adhered to)

Specifically: '(a \"b c\" d, e \"f,g\" h)'::rt leaves the double-quote
while '(a ""b c"" d, e ""f,g"" h)'::rt does not. Neither have the
field surround with double-quotes so should be invalid per the
documentation. When you follow the documentation they then both retain the
double-quote.

So if you follow the guidelines set forth in the documentation you get the
result the documentation promises. If you fail to follow the guidelines
you may still get a result but there is no promise made as to its
contents. Not ideal but also not likely to be changed after all this time.

>
>
>
>
> *create type rt as (a text, b text);with v as (select '(a "b c" d, e
> "f,g" h)'::rt as r)select '>'||(r).a||'<' as a, '>'||(r).b||'<' as bfrom
> v;*
>
> This demonstrates that, in my input, the double quotes are *within* each
> of the two text values—and definitely *do not surround* them.
>

Yep, which is why you have an issue. The "surround them" is indeed what
the text meant to say.

> I really would appreciate a reply to the second part of my earlier
> question:
>
> “please explain the rationale for what seems to me to be a
> counter-intuitive syntax design choice.”
>
[...]

> They chose the rule that a string value *must* be surrounded by double
> quotes and that other values must *not* be so surrounded. The JSON model
> resonates with my intuition.
>

This point wasn't answered because there is no good answer to be given.
The above is how someone in the mid-90s or so decided PostgreSQL should
handle this. I'll personally agree that more verbose but explicit, and
less forgiving, syntax and parsing, would have been a better choice. But
the choice has been made and isn't subject to change.

But regardless of what drove the original design choice if you really care
about it in a "want to learn" mode then it is very easy to observe the
defined behavior and critique it independently of how it came to be. If
all you want to do is make a left-handed jab at PostgreSQL for having a
non-intuitive to you design choice do act surprised when people don't
choose to respond - especially when none of us made the original choice.

The only thing we can do today is describe the system more clearly if we
have failed to do so adequately. You are probably in the best position,
then, to learn what it does and propose new wording that someone with
inexperienced (or biased toward a different system) eyes would understand
quickly and clearly.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-03 07:26:28 Re: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Michael Paquier 2020-04-03 06:54:38 Re: ALTER tbl rewrite loses CLUSTER ON index