Re: benchmarking Flex practices

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: benchmarking Flex practices
Date: 2019-07-29 15:40:12
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> On Sun, Jul 21, 2019 at 3:14 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So I'm feeling like maybe we should experiment to see what that
>> solution looks like, before we commit to going in this direction.
>> What do you think?

> Given the above wrinkles, I thought it was worth trying. Attached is a
> rough patch (don't mind the #include mess yet :-) ) that works like
> this:

> The lexer returns UCONST from xus and UIDENT from xui. The grammar has
> rules that are effectively:

> SCONST { do nothing}
> | UCONST { esc char is backslash }
> | UCONST UESCAPE SCONST { esc char is from $3 }

> ...where UESCAPE is now an unreserved keyword. To prevent shift-reduce
> conflicts, I added UIDENT to the %nonassoc precedence list to match
> IDENT, and for UESCAPE I added a %left precedence declaration. Maybe
> there's a more principled way. I also added an unsigned char type to
> the %union, but it worked fine on my compiler without it.

I think it might be better to drop the separate "Uescape" production and
just inline that into the calling rules, exactly per your sketch above.
You could avoid duplicating the escape-checking logic by moving that into
the str_udeescape support function. This would avoid the need for the
"uchr" union variant, but more importantly it seems likely to be more
future-proof: IME, any time you can avoid or postpone shift/reduce
decisions, it's better to do so.

I didn't try, but I think this might allow dropping the %left for
UESCAPE. That bothers me because I don't understand why it's
needed or what precedence level it ought to have.

> litbuf_udeescape() and check_uescapechar() were moved to gram.y. The
> former had be massaged to give error messages similar to HEAD. They're
> not quite identical, but the position info is preserved. Some of the
> functions I moved around don't seem to have any test coverage, so I
> should eventually do some work in that regard.

I don't terribly like the cross-calls you have between gram.y and scan.l
in this formulation. If we have to make these functions (hexval() etc)
non-static anyway, maybe we should shove them all into scansup.c?

> -Binary size is very close to v6. That is to say the grammar tables
> grew by about the same amount the scanner table shrank, so the binary
> is still about 200kB smaller than HEAD.


> -Performance is very close to v6 with the information_schema and
> pgbench-like queries with standard strings, which is to say also very
> close to HEAD. When the latter was changed to use Unicode escapes,
> however, it was about 15% slower than HEAD. That's a big regression
> and I haven't tried to pinpoint why.

I don't quite follow what you changed to produce the slower test case?
But that seems to be something we'd better run to ground before
deciding whether to go this way.

> -The ecpg changes here are only the bare minimum from HEAD to get it
> to compile, since I'm borrowing its additional token names (although
> they mean slightly different things). After a bit of experimentation,
> it's clear there's a bit more work needed to get it functional, and
> it's not easy to debug, so I'm putting that off until we decide
> whether this is the way forward.

On the whole I like this approach, modulo the performance question.
Let's try to work that out before worrying about ecpg.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-07-29 16:03:31 Re: Is ParsePrepareRecord dead function
Previous Message Stephen Frost 2019-07-29 15:37:05 Re: pg_upgrade fails with non-standard ACL