RE: [HACKERS] Postgres' lexer

From: "Ansley, Michael" <Michael(dot)Ansley(at)intec(dot)co(dot)za>
To: "'Leon'" <leon(at)udmnet(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, pgsql-hackers(at)postgreSQL(dot)org
Subject: RE: [HACKERS] Postgres' lexer
Date: 1999-09-02 12:58:31
Message-ID: 1BF7C7482189D211B03F00805F8527F748C02C@S-NATH-EXCH2
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> > To my mind, without spaces this construction *is* ambiguous, and
frankly
>> > I'd have expected the second interpretation ('+-' is a single operator
>> > name). Almost every computer language in the world uses "greedy"
>> > tokenization where the next token is the longest series of characters
>> > that can validly be a token. I don't regard the above behavior as
>> > predictable, natural, nor obvious. In fact, I'd say it's a bug that
>> > "3+-2" and "3+-x" are not lexed in the same way.
>> >
>>
>> Completely agree with that. This differentiating behavior looks like a
bug.
>>
>> > However, aside from arguing about whether the current behavior is good
>> > or bad, these examples seem to indicate that it doesn't take an
infinite
>> > amount of lookahead to reproduce the behavior. It looks to me like we
>> > could preserve the current behavior by parsing a '-' as a separate
token
>> > if it *immediately* precedes a digit, and otherwise allowing it to be
>> > folded into the preceding operator. That could presumably be done
>> > without VLTC.
>>
>> Ok. If we *have* to preserve old weird behavior, here is the patch.
>> It is to be applied over all my other patches. Though if I were to
>> decide whether to restore old behavior, I wouldn't do it. Because it
>> is inconsistency in grammar, i.e. a bug.
>>
If a construct is ambiguous, then the behaviour should be undefined (i.e.:
we can do what we like, within reason). If the user wants something
predictable, then she should use brackets ;-)

If 3+-2 presents an ambiguity (which it does) then make sure that you do
this: 3+(-2). If you have an operator +- then you should do this (3)+-(2).
However, if you have 3+-2 without brackets, then, because this is ambiguous
(assuming no +- operator), this is undefined, and we can do pretty much
whatever we feel like with it. Unless there is an operator +- defined,
because then the behaviour is no longer ambiguous. The longest possible
identifier is always matched, and this means that the +- will be identified.

Especially with the unary minus, my feeling is that it should be placed in
brackets if correct behaviour is desired.

MikeA

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 1999-09-02 13:05:41 Re: [HACKERS] Commercial question
Previous Message The Hermit Hacker 1999-09-02 12:43:12 RE: [HACKERS] md.c is feeling much better now, thank you