Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level?
Date: 2000-02-20 17:41:44
Message-ID: 5348.951068504@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> Could you test with some other frontend (python, perl, tcl, C) ?

Yup, psql is untrustworthy as a means of testing the backend's comment
handling ;-).

I committed lexer changes on Friday evening that I believe fix all of
the backend's problems with \r versus \n. The issue with unterminated
-- comments, which was Hannu's original complaint, was fixed awhile ago;
but we still had problems with comments terminated with \r instead of
\n, as well as some non-SQL-compliant behavior for -- comments between
the segments of a multiline literal, etc etc.

While fixing this I realized that there are some fundamental
discrepancies between the way the backend recognizes comments and the
way that psql does. These arise from the fact that the comment
introducer sequences /* and -- are also legal as parts of operator
names, and since the backend is based on lex which uses greedy longest-
available-match rules, you get things like this:

select *-- 123
ERROR: Can't find left op '*--' for type 23

(Parsing '*--' as an operator name wins over parsing just '*' as an
operator name, so that '--' would be recognized on the next call.)
More subtly,

select /**/- 22
ERROR: parser: parse error at or near ""

which is the backend's rather lame excuse for an "unterminated comment"
error. What happens here is that the sequence /**/- is bit off as a
single lexer token, then tested in this order to see if it is
(a) a complete "/* ... */" comment (nope),
(b) the start of a comment, "/* anything" (yup), or
(c) an operator (which would succeed if it got the chance).
There does not seem to be any way to persuade lex to stop at the "*/"
if it has a chance to recognize a longer token by applying the operator
rule.

Both of these problems are easily avoided by inserting some whitespace,
but I wonder whether we ought to try to fix them for real. One way
that this could be done would be to alter the lexer rules so that
operators are lexed a single character at a time, which'd eliminate
lex's tendency to recognize a long operator name in place of a comment.
Then we'd need a post-pass to recombine adjacent operator characters into
a single token. (This would forever prevent anyone from using operator
names that include '--' or '/*', but I'm not sure that's a bad thing.)
The post-pass would also be a mighty convenient place to fix the NOT NULL
problem that's giving us trouble in another thread: the post-pass would
need one-token lookahead anyway, so it could very easily convert NOT
followed by NULL into a single special token.

Meanwhile, psql is using some ad-hoc code to recognize comments,
rather than a lexer, and it thinks both of these sequences are indeed
comments. I also find that it strips out the -- flavor of comment,
but sends the /* */ flavor on through, which is just plain inconsistent.
I suggest we change psql to not strip -- comments either. The only
reason for psql to be in the comment-recognition business at all is
so that it can determine whether a semicolon is end-of-query or just
a character in a comment.

Another thing I'd like to fix here is to get the backend to produce
a more useful error message than 'parse error at or near ""' when it's
presented with an unterminated comment or unterminated literal.
The flex manual recommends coding like

<quote><<EOF>> {
error( "unterminated quote" );
yyterminate();
}

but <<EOF>> is a flex-ism not supported by regular lex. We already
tell people they have to use flex (though I'm not sure that's *really*
necessary at present); do we want to set that requirement in stone?
Or does anyone know another way to get this effect?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-02-20 21:50:51 Re: [HACKERS] new backslash command of psql
Previous Message Tom Lane 2000-02-20 16:34:38 Re: [HACKERS] Re: SQL compliance