Re: Sync ECPG scanner with core

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <jcnaylor(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync ECPG scanner with core
Date: 2018-11-13 18:21:29
Message-ID: 17698.1542133289@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <jcnaylor(at)gmail(dot)com> writes:
> Commit 21c8ee7946 was one of several that sync'd the backend scanner
> with the psql scanner, with the commit message stating "Someday it'd
> be nice to make ecpg's pgc.l more easily diff'able too, but today is
> not that day." Attached is a set of patches to make progress towards
> that. The ECPG tests pass. There's probably more that could be done,
> but this seems like a good base to start from.

Looks like good stuff, pushed with a small additional amount of work.

> -base_yyerror() is used once, but every other call site uses
> mmerror(), so use that instead.

Yeah, I think this is a bug, though minor. The other places that
deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though,
so I did it like that.

> In the course of doing this, I noticed a few other possible ECPG
> scanner cleanups to consider, although the payoff might not justify
> the work involved:

> -Copy process_integer_literal() from the core scanner (maybe only if
> decimal_fail is also brought over).

I went ahead and did both parts of that, mainly because as things
stood the ecpg lexer would produce a different token sequence for
"1..10" than the core would. I don't think this really matters
right at the moment, but if for instance ecpg decided to insert
spaces where it thought the token boundaries were, it would matter.

> -Match core's handling of unicode escapes, even though pgc.l is not backup-free.

Yeah, the amount of remaining diffs due to the omission of the anti-backup
rules is a bit annoying and confusing.

More generally, I wonder if it'd be practical to make pgc.l backup-free
altogether. I'm not sure that the resulting gain in lexing speed would
really be of interest to anyone, but just in terms of using uniform lexer
design rules, it might be worthwhile. I tried a run with -b just to see,
and indeed there are a bunch of backup cases in the ECPG-specific rules,
so it'd take some work.

> -Does ECPG still need its own functions for mm_alloc() and
> mm_strdup(), if we have equivalents in src/common?

The error handling isn't the same, so I don't think we can just replace
them with the src/common functions. It is, however, potentially worth
our trouble to fix things so that within pgc.l, palloc() and pstrdup()
are macros for mm_alloc/mm_strdup, thereby further reducing the diffs
to the core lexer.

I'd be pretty tempted to also change all the hard references to
base_yylval to go through a pointer variable, so that diffs like
this go away:

- yylval->str = pstrdup(yytext);
+ base_yylval.str = mm_strdup(yytext);

I believe that that'd result in more compact code, too --- on most
machines, references to global variables are more expensive in
code bytes than indirecting through a local pointer variable.

> -Use reentrant APIs?

Hm. There's no functional value in that for ecpg, but maybe worth doing
anyway to get rid of diffs like

- addlit(yytext, yyleng, yyscanner);
+ addlit(yytext, yyleng);

Guess it depends on how nitpicky you're feeling. In principle,
reducing these remaining diffs would ease future maintenance of
the lexers, but they're not something we change often.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-11-13 18:43:30 Re: Refactoring the checkpointer's fsync request queue
Previous Message Tomas Vondra 2018-11-13 18:18:54 Re: Race condition in WaitForBackgroundWorkerStartup