SQL/JSON functions vs. ECPG vs. STRING as a reserved word

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Meskes <meskes(at)postgresql(dot)org>
Subject: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date: 2022-05-29 20:19:42
Message-ID: 3661437.1653855582@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Commit 1a36bc9db (SQL/JSON query functions) introduced STRING as a
type_func_name_keyword. As per the complaint at [1], this broke use
of "string" as a table name, column name, function parameter name, etc.
The restriction about column names, in particular, seems likely to
break boatloads of applications --- wouldn't you think that "string"
is a pretty likely column name? We have no cover to claim "the SQL
standard says so", either, because they list STRING as an unreserved
keyword.

This is trivial enough to fix so far as the core grammar is concerned;
it works to just change STRING to be an unreserved_keyword. However,
various ECPG tests fall over, so I surmise that somebody felt it was
okay to break potentially thousands of applications in order to avoid
touching ECPG. I do not think that's an acceptable trade-off.

I poked into it a bit and found the proximate cause: ECPG uses
ECPGColLabelCommon to represent user-chosen type names in some
places, and that production *does not include unreserved_keyword*.
(Sure enough, the failing test cases try to use "string" as a
type name in a variable declaration.) That's a pre-existing
bit of awfulness, and it's indeed pretty awful, because it means
that any time we add a new keyword --- even a fully unreserved one
--- we risk breaking somebody's ECPG application. We just hadn't
happened to add any keywords to date that conflicted with type names
used in the ECPG test suite.

I looked briefly at whether we could improve that situation.
Two of the four uses of ECPGColLabelCommon in ecpg.trailer can be
converted to the more general ECPGColLabel without difficulty,
but trying to change either of the uses in var_type results in
literally thousands of shift/reduce and reduce/reduce conflicts.
This seems to be because what follows ecpgstart can be either a general
SQL statement or an ECPGVarDeclaration (beginning with var_type),
and bison isn't smart enough to disambiguate. I have a feeling that
this situation could be improved with enough elbow grease, because
plpgsql manages to solve a closely-related problem in allowing its
assignment statements to coexist with general SQL statements. But
I don't have the interest to tackle it personally, and anyway any
fix would probably be more invasive than we want to put in post-beta.

I also wondered briefly about just changing the affected test cases,
reasoning that breaking some ECPG applications that use "string" is
less bad than breaking everybody else that uses "string". But type
"string" is apparently a standard ECPG and/or INFORMIX thing, so we
probably can't get away with that.

Hence, I propose the attached, which fixes the easily-fixable
ECPGColLabelCommon uses and adds a hard-wired special case for
STRING to handle the var_type usage.

More generally, I feel like we have a process problem: there needs to
be a higher bar to adding new fully- or even partially-reserved words.
I might've missed it, but I don't recall that there was any discussion
of the compatibility implications of this change.

regards, tom lane

[1] https://www.postgresql.org/message-id/PAXPR02MB760049C92DFC2D8B5E8B8F5AE3DA9%40PAXPR02MB7600.eurprd02.prod.outlook.com

Attachment Content-Type Size
make-STRING-unreserved-1.patch text/x-diff 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-05-29 20:57:07 Re: SLRUs in the main buffer pool, redux
Previous Message Ranier Vilela 2022-05-29 20:10:05 Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)