Re: benchmarking Flex practices

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: benchmarking Flex practices
Date: 2019-11-26 15:32:29
Message-ID: 30156.1574782349@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> It seems something is not quite right in v9 with the error position reporting:

> SELECT U&'wrong: +0061' UESCAPE '+';
> ERROR: invalid Unicode escape character at or near "'+'"
> LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
> - ^
> + ^

> The caret is not pointing to the third token, or the second for that
> matter.

Interesting. For me it points at the third token with or without
your fix ... some flex version discrepancy maybe? Anyway, I have
no objection to your fix; it's probably cleaner than what I had.

>> * I did not do more with ecpg than get it to compile, using the
>> same hacks as in your v7. It still fails its regression tests,
>> but now the reason is that what we've done in parser/parser.c
>> needs to be transposed into the identical functionality in
>> ecpg/preproc/parser.c. Or at least some kind of functionality
>> there. A problem with this approach is that it presumes we can
>> reduce a UIDENT sequence to a plain IDENT, but to do so we need
>> assumptions about the target encoding, and I'm not sure that
>> ecpg should make any such assumptions. Maybe ecpg should just
>> reject all cases that produce non-ASCII identifiers? (Probably
>> it could be made to do something smarter with more work, but
>> it's not clear to me that it's worth the trouble.)

> Hmm, I thought we only allowed Unicode escapes in the first place if
> the server encoding was UTF-8. Or did you mean something else?

Well, yeah, but the problem here is that ecpg would have to assume
that the client encoding that its output program will be executed
with is UTF-8. That seems pretty action-at-a-distance-y.

I haven't looked closely at what ecpg does with the processed
identifiers. If it just spits them out as-is, a possible solution
is to not do anything about de-escaping, but pass the sequence
U&"..." (plus UESCAPE ... if any), just like that, on to the grammar
as the value of the IDENT token.

BTW, in the back of my mind here is Chapman's point that it'd be
a large step forward in usability if we allowed Unicode escapes
when the backend encoding is *not* UTF-8. I think I see how to
get there once this patch is done, so I definitely would not like
to introduce some comparable restriction in ecpg.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-26 15:49:11 Re: ERROR: attribute number 6 exceeds number of columns 5
Previous Message Alvaro Herrera 2019-11-26 15:09:41 Re: FETCH FIRST clause WITH TIES option