Re: ECPG gets embedded quotes wrong

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Michael Meskes <meskes(at)postgresql(dot)org>
Subject: Re: ECPG gets embedded quotes wrong
Date: 2020-10-21 00:35:15
Message-ID: 691295.1603240515@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> It looks to me like a sufficient fix is just to keep these quote
> sequences as-is within a converted string, so that the attached
> appears to fix it.

Poking at this further, I noticed that there's a semi-related bug
that this patch changes the behavior for, without fixing it exactly.
That has to do with use of a string literal as "execstring" in ECPG's
PREPARE ... FROM and EXECUTE IMMEDIATE commands. Right now, it
appears that there is simply no way to write a double quote as part
of the SQL command in this context. The EXECUTE IMMEDIATE docs say
that such a literal is a "C string", so one would figure that \"
(backslash-double quote) is the way, but that just produces syntax
errors. The reason is that ECPG's lexer is in SQL mode at this point
so it thinks the double-quoted string is a SQL quoted identifier, in
which backslash isn't special so the double quote terminates the
identifier. Ooops. Knowing this, you might try writing two double
quotes, but that doesn't work either, because the <xd>{xddouble}
lexer rule converts that to one double quote, and you end up with
an unterminated literal in the translated C code rather than in the
ECPG input.

My patch above modifies this to the extent that two double quotes
come out as two double quotes in the translated C code, but that
just results in nothing at all, since the C compiler sees adjacent
string literals, which the C standard commands it to concatenate.
Then you probably get a mysterious syntax error from the backend
because it thinks your intended-to-be SQL quoted identifier isn't
quoted. However, this is the behavior a C programmer would expect
for adjacent double quotes in a literal, so maybe people wouldn't
see it as mysterious.

Anyway, what to do?

1. Nothing, except document that you can't put a double quote into
the C string literal in these commands.

2. Make two-double-quotes work to produce a data double quote,
which I think could be done fairly easily with some post-processing
in the execstring production. However, this doesn't have much to
recommend it other than being easily implementable. C programmers
would not think it's natural, and the fact that backslash sequences
other than \" would work as a C programmer expects doesn't help.

3. Find a way to lex the literal per C rules, as the EXECUTE IMMEDIATE
docs clearly imply we should. (The PREPARE docs are silent on the
point AFAICS.) Unfortunately, this seems darn near impossible unless
we want to make IMMEDIATE (more) reserved. Since it's currently
unreserved, the grammar can't tell which flavor of EXEC SQL EXECUTE ...
it's dealing with until it looks ahead past the name-or-IMMEDIATE token,
so that it must lex the literal (if any) too soon. I tried putting in a
mid-rule action to switch the lexer back to C mode but failed because of
that ambiguity. Maybe we could make it work with a bunch of refactoring,
but it would be ugly and subtle code, in both the grammar and lexer.

On the whole I'm inclined to go with #1. There's a reason why nobody has
complained about this in twenty years, which is that the syntaxes with
a string literal are completely useless. There's no point in writing
EXEC SQL EXECUTE IMMEDIATE "SQL-statement" when you can just write
EXEC SQL SQL-statement, and similarly for PREPARE. (The other variant
that takes the string from a C variable is useful, but that one doesn't
have any weird quoting problem.) So I can't see expending the effort
for #3, and I don't feel like adding and documenting the wart of #2.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-10-21 00:54:53 Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers
Previous Message Michael Paquier 2020-10-21 00:31:31 Re: speed up unicode decomposition and recomposition