Re: hint infrastructure setup (v3)

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: hint infrastructure setup (v3)
Date: 2004-04-05 16:44:22
Message-ID: Pine.LNX.4.58.0404051843530.8826@sablons.cri.ensmp.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Dear Tom,

> No, just redefine "yyerror" as a macro that passes additional parameters.

Ok! That's just the kind of the hack-hook I was looking for!!

> > The terminals are not available, they must be kept separatly if you
> > want them. This can be done in yylex().
>
> We don't need them. Any already-shifted tokens are to the left of where
> the error is, no?

Yes and no. I agree that you don't need them for computing the follow set.
However it would help a lot to generate nicer hints. I'm looking for help
the user, and the follow set is just part of the help.

Think of typical "SELECT foo, bla, FROM xxx" (it looks stupid on a one
line query, but not so on a very large query) which is quite easier to
detect and hint about because of FROM is just after a comma.

The rule part is also a real issue. There is no easy way to translate
states into rules, to know whether we're somewhere in a "ShowStmt" or
"AlterTableStmt"...

If you're after a comma in state 1232, should you just say IDENT... I'd
rather say "user name" or "table name" if I can... Otherwise the hint
stays at a low parser level. Good hints requires to know about the
current context, and it is not easy to get that deep in the automaton.

> Yeah, I had come to the same conclusion --- state moves made without
> consuming input would need to be backed out if we wanted to report the
> complete follow set. I haven't yet looked to see how to do that.

Well, following you're previous suggestion, one can redefine the YYLEX
macro to save the current state each time a token is required.

> > As you noted, for things like "SHOW 123", the follow set basically
> > includes all keywords although you can have SHOW ALL or SHOW name.
> > So, as you suggested, you can either say "ident" as a simplification, but
>
> You're ignoring the distinction between classes of keywords. I would
> not recommend treating reserved_keywords as a subclass of ident.

Ok, I agree that it can help in this very case a little bit here because
ALL is reserved. I did not make this distinction when I played with bison.

> > (5) anything that can be done would be hardwired to one version of bison.
>
> I think this argument is completely without merit.

Possible;-)

However I don't like writing hacks that depends on other people future
behavior.

> > (b) write a new "recursive descendant" parser, and drop "gram.y"
>
> Been there, done that, not impressed with the idea. There's a reason
> why people invented parser generators...

Sure. It was just a list of options;-)

> > As a side effect of my inspection is that the automaton generated by bison
> > could be simplified if a different tradeoff between the lexer, the parser
> > and the post-processing would be chosen. Namelly, all tokens that are
> > just identifiers should be dropped and processed differently.
>
> We are not going to reserve the keywords that are presently unreserved.

Reserving keywords would help, but I would not think it could be accepted,
because it is not SQL philosophy. SQL is about 30 years also, as old
as yacc ideas... but they did not care at the time;-) When you look at
the syntax, it was designed with a recursive parser in mind.

Part of my comment was to explain "why" the generated automaton is large.
SQL is a small language, but it has a big automaton in postgresql, and
this is IMHO the explanation.

> If you can think of a reasonable way to stop treating them as separate
> tokens inside the grammar without altering the user-visible behavior,
> I'm certainly interested.

I was thinking about type names especially, and maybe others.

I join a small proof-of-concept patch to drop some tokens out of the
parser. As a results, 6 tokens, 6 rules and 9 states are removed,
and the automaton table is reduced by 438 entries (1.5%). Not too bad;-)
I think others could be dealt with similarily, or with more work.

Thanks for the discussion,

--
Fabien.

Attachment Content-Type Size
less_keywords.patch text/plain 5.9 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2004-04-05 16:52:10 Re: hint infrastructure setup (v3)
Previous Message Fabien COELHO 2004-04-05 16:41:57 Re: hint infrastructure setup (v3)