Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-29 21:59:52
Message-ID: CAJVSVGWv8-dOe0t2UpNGZ4Cn3rWU02EC5grHHOSkCbxqNjrsFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think 0001 with complete keyword lookup replacement is in decent
enough shape to post. Make check-world passes. A few notes and
caveats:

-I added an --extern option to the script for the core keyword
headers. This also capitalizes variables.
-ECPG keyword lookup is a bit different in that the ecpg and sql
lookup functions are wrapped in a single function rather than called
separately within pgc.l. It might be worth untangling that, but I have
not done so.
-Some variable names haven't changed even though now they're only
referring to token values, which might be confusing.
-I haven't checked if I need to install the generated headers.
-I haven't measured performance or binary size. If anyone is excited
enough to do that, great, otherwise I'll do that as time permits.
-There are probably makefile bugs.

Now, on to previous review points:

On 12/27/18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> +/* Payload data for keywords */
> +typedef struct ScanKeywordAux
> +{
> + int16 value; /* grammar's token code */
> + char category; /* see codes above */
> +} ScanKeywordAux;
>
> There isn't really any point in changing category to "char", because
> alignment considerations will mandate that sizeof(ScanKeywordAux) be
> a multiple of 2 anyway. With some compilers we could get around that
> with a pragma to force non-aligned storage, but doing so would be a
> net loss on most non-Intel architectures.

Reverted, especially since we can skip the struct entirely for some
callers as you pointed out below.

> diff --git a/src/pl/plpgsql/src/.gitignore b/src/pl/plpgsql/src/.gitignore
> @@ -1,3 +1,4 @@
> +/*kwlist_d.h
>
> Not a fan of using wildcards in .gitignore files, at least not when
> there's just one or two files you intend to match.

Removed.

> # Force these dependencies to be known even without dependency info built:
> -pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o:
> plpgsql.h pl_gram.h plerrcodes.h
> +pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o:
> plpgsql.h pl_gram.h plerrcodes.h pl_unreserved_kwlist_d.h
>
> Hm, do we really need any more than pl_scanner.o to depend on that header?

I think you're right, so separated into a new rule.

> +# Parse keyword header for names.
> +my @keywords;
> +while (<$kif>)
> +{
> + if (/^PG_KEYWORD\("(\w+)",\s*\w+,\s*\w+\)/)
>
> This is assuming more than it should about the number of arguments for
> PG_KEYWORD, as well as what's in them. I think it'd be sufficient to
> match like this:
>
> if (/^PG_KEYWORD\("(\w+)",/)

...and...

> diff --git a/src/pl/plpgsql/src/pl_unreserved_kwlist.h
> b/src/pl/plpgsql/src/pl_unreserved_kwlist.h
> +/* name, value, category */
> +PG_KEYWORD("absolute", K_ABSOLUTE, UNRESERVED_KEYWORD)
>
> Likewise, I'd just have these be two-argument macros. There's no reason
> for the various kwlist.h headers to agree on the number of payload
> arguments for PG_KEYWORD.

Both done, however...

> +/* FIXME: Have to redefine this symbol for the WIP. */
> +#undef PG_KEYWORD
> +#define PG_KEYWORD(kwname, value, category) {value, category},
> +
> +static const ScanKeywordAux unreserved_keywords[] = {
> +#include "pl_unreserved_kwlist.h"
> };
>
> The category isn't useful for this keyword list, so couldn't you
> just make this an array of uint16 values?

Yes, this works for the unreserved keywords. The reserved ones still
need the aux struct to work with the core scanner, even though scan.l
doesn't reference category either. This has the consequence that we
can't dispense with category, e.g.:

PG_KEYWORD("all", K_ALL, RESERVED_KEYWORD)

...unless we do without the struct entirely, but that's not without
disadvantages as you mentioned.

I decided to export the struct (rather than just int16 for category)
to the frontend, even though we have to set the token values to zero,
since there might someday be another field of use to the frontend.
Also to avoid confusion.

> I don't mind allowing the prefix to default to empty. What I was
> concerned about was that base_filename could end up undefined.
> Probably the thing to do is to generate base_filename separately,
> say by stripping any initial ".*/" sequence and then substitute
> '_' for '.'.

I removed assumptions about the filename.

> +Options:
> + -o output path
> + -p optional prefix for generated data structures
>
> This usage message is pretty vague about how you write the options
> (cf gripe above).

I tried it like this:

Usage: gen_keywords.pl [--output/-o <path>] [--prefix/-p <prefix>] input_file
--output Output directory
--prefix String prepended to var names in the output file

On 12/27/18, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> wrote:
> I would rather we used the standard perl module Getopt::Long, as
> numerous programs we have already do.

Done. I'll also send a patch later to bring some other scripts in line.

-John Naylor

Attachment Content-Type Size
v5-0001-Use-offset-based-keyword-lookup.patch text/x-patch 59.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-12-29 22:19:50 Re: [PATCH] kNN for btree
Previous Message Vik Fearing 2018-12-29 21:40:14 Optimize constant MinMax expressions