Re: BUG #15273: Lexer bug with UESCAPE

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ladayaroslav(at)yandex(dot)ru
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15273: Lexer bug with UESCAPE
Date: 2018-07-10 19:50:51
Message-ID: 23850.1531252251@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

=?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> SELECT U&'a' /*c1*/ UESCAPE /*c2*/ 'x';
> ERROR: syntax error at or near "'x'"
> LINE 1: SELECT U&'a' /*c1*/ UESCAPE /*c2*/ 'x';

> I think the former is a bug, as, per ISO SQL, a comment is equivalent to
> whitespace (with newline), and therefore, should be ignored here.

I'd classify this as "won't fix". It'd require pretty significant bloat
in the lexer rules to make it happen, and it doesn't really seem worth it.

Also, I'm going to push back on the claim that allowing comments there
is required by the SQL spec. The relevant rules in SQL:2011 are

<Unicode character string literal> ::=
[ <introducer> <character set specification> ]
U <ampersand> <quote> [ <Unicode representation>... ] <quote>
[ { <separator> <quote> [ <Unicode representation>... ] <quote> }... ]
<Unicode escape specifier>

<Unicode escape specifier> ::=
[ UESCAPE <quote> <Unicode escape character> <quote> ]

I do not see any principled way of arguing that these rules require
comments to be allowed adjacent to UESCAPE without also claiming
that they must be allowed between, say, the initial 'U' and the
ampersand. The only place these rules allow a <separator> is
between segments of a multiline literal. It looks to me like an
extension that we even allow whitespace around UESCAPE.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dmitry Dolgov 2018-07-10 20:39:28 Problem with tupdesc in jsonb_to_recordset
Previous Message PG Bug reporting form 2018-07-10 17:15:13 BUG #15273: Lexer bug with UESCAPE