Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <jcnaylor(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-20 23:00:24
Message-ID: 9936.1545346824@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <jcnaylor(at)gmail(dot)com> writes:
> On 12/18/18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'd be kind of inclined to convert all uses of ScanKeyword to the new way,
>> if only for consistency's sake. On the other hand, I'm not the one
>> volunteering to do the work.

> That's reasonable, as long as the design is nailed down first. Along
> those lines, attached is a heavily WIP patch that only touches plpgsql
> unreserved keywords, to test out the new methodology in a limited
> area. After settling APIs and name/directory bikeshedding, I'll move
> on to the other four keyword types.

Let the bikeshedding begin ...

> There's a new Perl script, src/common/gen_keywords.pl,

I'd be inclined to put the script in src/tools, I think. IMO src/common
is for code that actually gets built into our executables.

> which takes
> pl_unreserved_kwlist.h as input and outputs
> pl_unreserved_kwlist_offset.h and pl_unreserved_kwlist_string.h.

I wonder whether we'd not be better off producing just one output
file, in which we have the offsets emitted as PG_KEYWORD macros
and then the giant string emitted as a macro definition, ie
something like

#define PG_KEYWORD_STRING \
"absolute\0" \
"alias\0" \
...

That simplifies the Makefile-hacking, at least, and it possibly gives
callers more flexibility about what they actually want to do with the
string.

> The
> output headers are not installed or symlinked anywhere. Since the
> input keyword lists will never be #included directly, they might be
> better as .txt files, like errcodes.txt. If we went that far, we might
> also remove the PG_KEYWORD macros (they'd still be in the output
> files) and rename parser/kwlist.h to common/core_kwlist.txt. There's
> also a case for not changing things unnecessarily, especially if
> there's ever a new reason to include the base keyword list directly.

I'm for "not change things unnecessarily". People might well be
scraping the keyword list out of parser/kwlist.h for other purposes
right now --- indeed, it's defined the way it is exactly to let
people do that. I don't see a good reason to force them to redo
whatever tooling they have that depends on that. So let's build
kwlist_offsets.h alongside that, but not change kwlist.h itself.

> To keep the other keyword types functional, I had to add a separate
> new struct ScanKeywordOffset and new function
> ScanKeywordLookupOffset(), so the patch is a bit messier than the
> final will be.

Check.

> I used the global .gitignore, but maybe that's an abuse of it.

Yeah, I'd say it is.

> +# TODO: Error out if the keyword names are not in ASCII order.

+many for including such a check.

Also note that we don't require people to have Perl installed when
building from a tarball. Therefore, these derived headers must get
built during "make distprep" and removed by maintainer-clean but
not distclean. I think this also has some implications for VPATH
builds, but as long as you follow the pattern used for other
derived header files (e.g. fmgroids.h), you should be fine.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-12-20 23:04:11 Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Previous Message Alexander Korotkov 2018-12-20 22:50:41 Re: GIN predicate locking slows down valgrind isolationtests tremendously