Re: Ltree syntax improvement

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Belyavsky <beldmit(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>
Subject: Re: Ltree syntax improvement
Date: 2020-01-21 19:13:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dmitry Belyavsky <beldmit(at)gmail(dot)com> writes:
> If the C part will be reviewed and considered mergeable, I'll update the
> plpython tests.

I haven't looked at any of the code involved in this, but I did examine
the "failing" plpython test, and I'm quite distressed about that change
of behavior. Simply removing the test case is certainly not okay,
and I do not think that just changing it to accept this new behavior
is okay either. As Nikita said upthread:

>> 7. ltree_plpython test does not fail now because Python list is converted to a
>> text and then to a ltree, and the textual representation of a Python list has
>> become a valid ltree text:
>> SELECT $$['foo', 'bar', 'baz']$$::ltree;
>> ltree
>> -------------------------
>> "['foo', 'bar', 'baz']"
>> (1 row)
>> So Python lists can be now successfully converted to ltrees without a transform,
>> but the result is not that is expected (''::ltree).

If this case doesn't throw an error, then we're going to have a
compatibility problem whenever somebody finally gets around to
implementing the python-to-ltree transform properly, because it
would break any code that might be relying on this (wrong) behavior.

In general, I think it's a mistake to allow unquoted punctuation to be
taken as part of an ltree label, which is what this patch is evidently
doing. By doing that, you'll make it impossible for anyone to ever
again extend the ltree syntax, because if they want to assign special
meaning to braces or whatever, they can't do so without breaking
existing applications. For example, if the existing code allowed
double-quote or backslash as a label character, we'd already have
rejected this patch as being too big a compatibility break. So it's
not very forward-thinking to close off future improvements like this.

Thus, what I think you should do is require non-alphanumeric label
characters to be quoted, either via double-quotes or backslashes
(although it's questionable whether we really need two independent
quoting mechanisms here). That would preserve extensibility, and
it'd also preserve our freedom to fix ltree_plpython later, since
the case of interest would still be an error for now. And it would
mean that you don't have subtly different rules for what's data in
ltree versus what's data in lquery or ltxtquery.

BTW, the general rule in existing backend code that's doing string
parsing is to allow non-ASCII (high-bit-set) characters to be taken as
data without inquiring too closely as to what they are. This avoids a
bunch of locale and encoding issues without much loss of flexibility.
(If we do ever extend the ltree syntax again, we'd certainly choose
ASCII punctuation characters for whatever special symbols we need,
else the feature might not be available in all encodings.) So for
instance in your examples involving "Ñ", it's fine to take that as a
label character without concern for locale/encoding.

I'm not sure what I think about the whitespace business. It looks
like what you propose is to strip unquoted leading and trailing
whitespace but allow embedded whitespace. There's precedent for that,
certainly, but I wonder whether it isn't too confusing. In any case
you didn't document that.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-21 19:17:53 Re: Protect syscache from bloating with negative cache entries
Previous Message Pavel Stehule 2020-01-21 18:51:17 Re: [Proposal] Global temporary tables