Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, John Naylor <jcnaylor(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-27 17:12:39
Message-ID: 4029.1545930759@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-12-26 14:03:57 -0500, Tom Lane wrote:
>> It's impossible to write correct RD parsers by hand for any but the most
>> trivial, conflict-free languages, and what we have got to deal with
>> is certainly neither of those; moreover, it's a constantly moving target.
>> We'd be buying into an endless landscape of parser bugs if we go that way.
>> It's *not* worth it.

> It's not exactly new that people end up moving to bison to recursive
> descent parsers once they hit the performance problems and want to give
> better error messages. E.g. both gcc and clang have hand-written
> recursive-descent parsers for C and C++ these days.

Note that they are dealing with fixed language definitions. Furthermore,
there's no need to worry about whether that code has to be hacked on by
less-than-expert people. Neither condition applies to us.

The thing that most concerns me about not using a grammar tool of some
sort is that with handwritten RD, it's very easy to get into situations
where you've "defined" (well, implemented, because you never did have
a formal definition) a language that is ambiguous, admitting of more
than one valid parse interpretation. You won't find out until someone
files a bug report complaining that some apparently-valid statement
isn't doing what they expect. At that point you are in a world of hurt,
because it's too late to fix it without changing the language definition
and thus creating user-visible compatibility breakage.

Now bison isn't perfect in this regard, because you can shoot yourself
in the foot with ill-considered precedence specifications (and we've
done so ;-(), but it is light-years more likely to detect ambiguous
grammar up-front than any handwritten parser logic is.

If we had a tool that proved a BNF grammar non-ambiguous and then
wrote an RD parser for it, that'd be fine with me --- but we need
a tool, not somebody claiming he can write an error-free RD parser
for an arbitrary language. My position is that anyone claiming that
is just plain deluded.

I also do not buy your unsupported-by-any-evidence claim that the
error reports would be better. I've worked on RD parsers in the
past, and they're not really better, at least not without expending
enormous amounts of effort --- and run-time cycles --- specifically
on the error reporting aspect. Again, I don't see that happening
for us.

> I don't buy that we're inable to write a descent parser that way.

I do not think that we could write one for the current state of the
PG grammar without an investment of effort so large that it's not
going to happen. Even if such a parser were to spring fully armed
from somebody's forehead, we absolutely cannot expect that it would
continue to work correctly after non-wizard contributors modify it.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2018-12-27 17:43:42 Re: Alternative to \copy in psql modelled after \g
Previous Message David Steele 2018-12-27 17:03:31 Re: Remove Deprecated Exclusive Backup Mode