Re: benchmarking Flex practices

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: benchmarking Flex practices
Date: 2019-11-26 11:38:00
Message-ID: CACPNZCuGfVe+Wp+4N7rVfQ4GAYjuCHzSsx8NR8=2n_OTwU9Eow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 26, 2019 at 5:51 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> [ My apologies for being so slow to get back to this ]

No worries -- it's a nice-to-have, not something our users are excited about.

> It struck me though that there's another solution we haven't discussed,
> and that's to make the token lookahead filter in parser.c do the work
> of converting UIDENT [UESCAPE SCONST] to IDENT, and similarly for the
> string case.

I recently tried again to get gram.y to handle it without precedence
hacks (or at least hacks with less mystery) and came to the conclusion
that maybe it just doesn't belong in the grammar after all. I hadn't
thought of any alternatives, so thanks for working on that!

It seems something is not quite right in v9 with the error position reporting:

SELECT U&'wrong: +0061' UESCAPE '+';
ERROR: invalid Unicode escape character at or near "'+'"
LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
- ^
+ ^

The caret is not pointing to the third token, or the second for that
matter. What worked for me was un-truncating the current token before
calling yylex again. To see if I'm on the right track, I've included
this in the attached, which applies on top of your v9.

> Generally, I'm pretty happy with this approach: it touches gram.y
> hardly at all, and it removes just about all of the complexity from
> scan.l. I'm happier about dropping the support code into parser.c
> than the other choices we've discussed.

Seems like the best of both worlds. If we ever wanted to ditch the
whole token filter and use Bison's %glr mode, we'd have extra work to
do, but there doesn't seem to be a rush to do so anyway.

> There's still undone work here, though:
>
> * I did not touch psql. Probably your patch is fine for that.
>
> * I did not do more with ecpg than get it to compile, using the
> same hacks as in your v7. It still fails its regression tests,
> but now the reason is that what we've done in parser/parser.c
> needs to be transposed into the identical functionality in
> ecpg/preproc/parser.c. Or at least some kind of functionality
> there. A problem with this approach is that it presumes we can
> reduce a UIDENT sequence to a plain IDENT, but to do so we need
> assumptions about the target encoding, and I'm not sure that
> ecpg should make any such assumptions. Maybe ecpg should just
> reject all cases that produce non-ASCII identifiers? (Probably
> it could be made to do something smarter with more work, but
> it's not clear to me that it's worth the trouble.)

Hmm, I thought we only allowed Unicode escapes in the first place if
the server encoding was UTF-8. Or did you mean something else?

> If this seems like a reasonable approach to you, please fill in
> the missing psql and ecpg bits.

Will do.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v9-addendum-handle-uescapes-in-parser.patch application/octet-stream 581 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2019-11-26 12:13:16 [PATCH] Remove twice assignment with var pageop (nbtree.c).
Previous Message Etsuro Fujita 2019-11-26 11:35:33 Re: A problem about partitionwise join