Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-23 17:51:57
Message-ID: CAJVSVGVM5tt9n9is=zg4dgNAPUWg-MZ3eta-BOhA57_yxZc1kQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/22/18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> John Naylor <jcnaylor(at)gmail(dot)com> writes:
>> Using a single file also gave me another idea: Take value and category
>> out of ScanKeyword, and replace them with an index into another array
>> containing those, which will only be accessed in the event of a hit.
>> That would shrink ScanKeyword to 4 bytes (offset, index), further
>> increasing locality of reference. Might not be worth it, but I can try
>> it after moving on to the core scanner.
>
> I like that idea a *lot*, actually, because it offers the opportunity
> to decouple this mechanism from all assumptions about what the
> auxiliary data for a keyword is.

Okay, in that case I went ahead and did it for WIP v3.

> (it'd be a good idea to have a switch that allows specifying the
> prefix of these constant names).

Done as an optional switch, and tested, but not yet used in favor of
the previous method as a fallback. I'll probably do it in the final
version to keep lines below 80, and to add 'core_' to the core keyword
vars.

> /* Payload data for keywords */
> typedef struct MyKeyword
> {
> int16 value;
> int16 category;
> } MyKeyword;

I tweaked this a bit to

typedef struct ScanKeywordAux
{
int16 value; /* grammar's token code */
char category; /* see codes above */
} ScanKeywordAux;

It seems that category was only 2 bytes to make ScanKeyword a power of
2 (of course that was on 32 bit machines and doesn't hold true
anymore). Using char will save another few hundred bytes in the core
scanner. Since we're only accessing this once per identifier, we may
not need to worry so much about memory alignment.

> Aside from being arguably better from the locality-of-reference
> standpoint, this gets us out of the weird ifdef'ing you've got in
> the v2 patch. The kwlist_d.h headers can be very ordinary headers.

Yeah, that's a nice (and for me unexpected) bonus.

-John Naylor

Attachment Content-Type Size
v3-wip-use-offset-based-scankeywords.patch text/x-patch 22.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-12-23 20:14:50 Re: Function to track shmem reinit time
Previous Message Darafei Komяpa Praliaskouski 2018-12-23 16:23:33 Re: Joins on TID