Keyword classifications

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Keyword classifications
Date: 2015-12-31 20:13:58
Message-ID: 20197.1451592838@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did a bit of initial poking at the problem complained of in bug #13840,
namely that if you use "none" as the value of a reloption, you get a
syntax error when trying to reload a dump containing the table or index
declaration. The submitter blames this on pg_dump but it is surely not
pg_dump's fault; it's just printing what pg_get_indexdef() gave it.
And what pg_get_indexdef() prints is just verbatim what is in each array
element of pg_class.reloptions.

Now, one line of thought here is that flatten_reloptions() is out of its
mind to not be worrying about quoting the reloption values. And perhaps
it is, but I think if we go that direction, we may be fighting similar
fires for awhile to come. psql's describe.c, for example, doesn't worry
about quoting anything when printing reloptions, and there's likely
similar code in third-party clients. Also, a solution like this would
do nothing for existing dump files.

The other line of thought is that we're already making an effort to allow
any keyword to appear as the value of a def_arg, and maybe we should try
to make that work 100% instead of only 90%. The existing code in
gram.y is:

def_arg: func_type { $$ = (Node *)$1; }
| reserved_keyword { $$ = (Node *)makeString(pstrdup($1)); }
| qual_all_Op { $$ = (Node *)$1; }
| NumericOnly { $$ = (Node *)$1; }
| Sconst { $$ = (Node *)makeString($1); }
;

so we already allow any keyword that can be a type name, as well as
all fully reserved keywords. The only thing that's missing is
col_name_keyword (which, not coincidentally, includes NONE). Now,
if you just try to add a production for col_name_keyword like the one
for reserved_keyword, you get a bunch of reduce-reduce errors because
col_name_keyword includes several words that can also be type names, and
Bison can't decide whether those should go through the func_type or
col_name_keyword paths. But some experimentation suggests that we could
fix that by subdividing col_name_keyword into two categories, one being
the keywords that can also be type names (BIGINT, BIT, etc) and one
being those that can't (BETWEEN, COALESCE, etc). Everywhere
col_name_keyword is used now, just mention both sub-categories.
And in def_arg, add a production for only the second sub-category.

I think such a categorization would actually be cleaner than what we have
now; if you read the comments for col_name_keyword, they're pretty squishy
about whether these keywords can be type or function names or not, and
this subdivision would make that a little clearer. Interestingly, it
also seems that the grammar tables become slightly smaller, suggesting
that Bison also finds this more regular.

However, we have a problem, which is that in the distant past somebody
had the bright idea of exposing the keyword categories to userspace via
pg_get_keywords(). That means we have to either add a new category code
to pg_get_keywords' output, or have it print the same code for both of
these new categories, neither of which seems exactly ideal.

Another question is whether we'd risk back-patching such a change.
It'd be relatively self-contained in terms of our own code, but I'm
wondering whether there's any third-party code with dependencies
on what keyword categories exist.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paragon Corporation 2015-12-31 20:18:13 IMPORT FOREIGN SCHEMA return create foreign table commands are those further filtered in LIMIT and EXCEPT cases?
Previous Message Tomas Vondra 2015-12-31 20:11:34 Re: PATCH: postpone building buckets to the end of Hash (in HashJoin)