Re: Ltree syntax improvement

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Dmitry Belyavsky <beldmit(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Nikolay Shaplov <dhyan(at)nataraj(dot)su>
Subject: Re: Ltree syntax improvement
Date: 2020-03-07 00:59:15
Message-ID: 38c5bbeb-c3eb-a016-0b8f-907461bbd0fc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached new version of the patch.

I did major refactoring of ltree label parsing, extracting common parsing
code for ltree, lquery, and ltxtquery. This greatly simplified state
machines.

On the advice of Tomas Vondra, I also extracted a preliminary patch with
'if' to 'switch' conversion.

On 21.01.2020 22:13, Tom Lane wrote:

> Dmitry Belyavsky <beldmit(at)gmail(dot)com> writes:
>> If the C part will be reviewed and considered mergeable, I'll update the
>> plpython tests.
> I haven't looked at any of the code involved in this, but I did examine
> the "failing" plpython test, and I'm quite distressed about that change
> of behavior. Simply removing the test case is certainly not okay,
> and I do not think that just changing it to accept this new behavior
> is okay either. As Nikita said upthread:
>
>>> 7. ltree_plpython test does not fail now because Python list is converted to a
>>> text and then to a ltree, and the textual representation of a Python list has
>>> become a valid ltree text:
>>>
>>> SELECT $$['foo', 'bar', 'baz']$$::ltree;
>>> ltree
>>> -------------------------
>>> "['foo', 'bar', 'baz']"
>>> (1 row)
>>>
>>> So Python lists can be now successfully converted to ltrees without a transform,
>>> but the result is not that is expected ('foo.bar.baz'::ltree).
> If this case doesn't throw an error, then we're going to have a
> compatibility problem whenever somebody finally gets around to
> implementing the python-to-ltree transform properly, because it
> would break any code that might be relying on this (wrong) behavior.
>
> In general, I think it's a mistake to allow unquoted punctuation to be
> taken as part of an ltree label, which is what this patch is evidently
> doing. By doing that, you'll make it impossible for anyone to ever
> again extend the ltree syntax, because if they want to assign special
> meaning to braces or whatever, they can't do so without breaking
> existing applications. For example, if the existing code allowed
> double-quote or backslash as a label character, we'd already have
> rejected this patch as being too big a compatibility break. So it's
> not very forward-thinking to close off future improvements like this.
>
> Thus, what I think you should do is require non-alphanumeric label
> characters to be quoted, either via double-quotes or backslashes
> (although it's questionable whether we really need two independent
> quoting mechanisms here). That would preserve extensibility, and
> it'd also preserve our freedom to fix ltree_plpython later, since
> the case of interest would still be an error for now. And it would
> mean that you don't have subtly different rules for what's data in
> ltree versus what's data in lquery or ltxtquery.

Now non-alphanumeric label characters should be escaped in ltree,
lquery and ltxtquery. Plpython tests does not require changes now.

> BTW, the general rule in existing backend code that's doing string
> parsing is to allow non-ASCII (high-bit-set) characters to be taken as
> data without inquiring too closely as to what they are. This avoids a
> bunch of locale and encoding issues without much loss of flexibility.
> (If we do ever extend the ltree syntax again, we'd certainly choose
> ASCII punctuation characters for whatever special symbols we need,
> else the feature might not be available in all encodings.) So for
> instance in your examples involving "Ñ", it's fine to take that as a
> label character without concern for locale/encoding.
> I'm not sure what I think about the whitespace business. It looks
> like what you propose is to strip unquoted leading and trailing
> whitespace but allow embedded whitespace. There's precedent for that,
> certainly, but I wonder whether it isn't too confusing. In any case
> you didn't document that.

Embedded whitespace should also be escaped now. I'm also not sure
about stripping unquoted leading and trailing whitespace.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Replace-if-with-switch-in-ltree-code-20200307.patch text/x-patch 10.4 KB
0002-Ltree-syntax-improvements-20200307.patch text/x-patch 77.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-07 01:03:40 Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions
Previous Message Justin Pryzby 2020-03-06 23:35:07 Re: pg_ls_tmpdir to show directories and shared filesets