Re: benchmarking Flex practices

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: benchmarking Flex practices
Date: 2019-06-27 06:25:26
Message-ID: CACPNZCud6z=n_sVmhzLd9US+1C4nc2Nn-AawxKZ=S8adCBVaEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> > I found a possible other way to bring the size of the transition table
> > under 32k entries while keeping the existing no-backup rules in place:
> > Replace the "quotecontinue" rule with a new state. In the attached
> > draft patch, when Flex encounters a quote while inside any kind of
> > quoted string, it saves the current state and enters %xqs (think
> > 'quotestop'). If it then sees {whitespace_with_newline}{quote}, it
> > reenters the previous state and continues to slurp the string,
> > otherwise, it throws back everything and returns the string it just
> > exited. Doing it this way is a bit uglier, but with some extra
> > commentary it might not be too bad.
>
> I had an epiphany and managed to get rid of the backup states.
> Regression tests pass. The array is down to 30367 entries and the
> binary is smaller by 172kB on Linux x86-64. Performance is identical
> to master on both tests mentioned upthread. I'll clean this up and add
> it to the commitfest.

For the commitfest:

0001 is a small patch to remove some unneeded generality from the
current rules. This lowers the number of elements in the yy_transition
array from 37045 to 36201.

0002 is a cleaned up version of the above, bring the size down to 29521.

I haven't changed psqlscan.l or pgc.l, in case this approach is
changed or rejected

With the two together, the binary is about 175kB smaller than on HEAD.

I also couldn't resist playing around with the idea upthread to handle
unicode escapes in parser.c, which further reduces the number of
states down to 21068, which allows some headroom for future additions
without going back to 32-bit types in the transition array. It mostly
works, but it's quite ugly and breaks the token position handling for
unicode escape syntax errors, so it's not in a state to share.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
v3-0001-Remove-some-unneeded-generality-from-the-core-Fle.patch application/octet-stream 1.6 KB
v3-0002-Replace-the-Flex-quotestop-rules-with-a-new-exclu.patch application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2019-06-27 06:42:05 Re: Removing unneeded self joins
Previous Message Pavan Deolasee 2019-06-27 05:32:15 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits