Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, John Naylor <jcnaylor(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-26 18:02:17
Message-ID: 20181226180217.qad6ahkkzk4g5poy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-12-26 11:50:18 -0500, Robert Haas wrote:
> On Wed, Dec 26, 2018 at 11:22 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I think there's a lot of goalpost-moving going on here. The original
> > idea was to trim the physical size of the data structure, as stated
> > in the thread subject, and just reap whatever cache benefits we got
> > along the way from that. I am dubious that we actually have any
> > performance problem in this code that needs a big dollop of added
> > complexity to fix.
>
> I have seen ScanKeywordLookup show up in profiles quite often and
> fairly prominently -- like several percent of total runtime. I'm not
> trying to impose requirements on John's patch, and I agree that
> reducing the physical size of the structure is a good step whether
> anything else is done or not. However, I don't see that as a reason to
> shut down further discussion of other possible improvements. If his
> patch makes this disappear from profiles, cool, but if it doesn't,
> then sooner or later somebody's going to want to do more.

I agree. And most of the patch would be a pre-requisite for anything
more elaborate anyway.

> FWIW, my bet is this helps but isn't enough to get rid of the problem
> completely. A 9-step binary search has got to be slower than a really
> well-optimized hash table lookup.

Yea, at least with a non-optimized layout. If we'd used a binary search
optimized lookup order it might be different, but probably at best
equivalent to a good hashtable.

> > In my hands, the only part of the low-level parsing code that commonly
> > shows up as interesting in profiles is the Bison engine. That's probably
> > because the grammar tables are circa half a megabyte and blow out cache
> > pretty badly :-(. I don't know of any way to make that better,
> > unfortunately. I suspect that it's just going to get worse, because
> > people keep submitting additions to the grammar.
>
> I'm kinda surprised that you haven't seen ScanKeywordLookup() in
> there, but I agree with you that the size of the main parser tables is
> a real issue, and that there's no easy solution. At various times
> there has been discussion of using some other parser generator, and
> I've also toyed with the idea of writing one specifically for
> PostgreSQL. Unfortunately, it seems like bison is all but
> unmaintained; the alternatives are immature and have limited adoption
> and limited community; and writing something from scratch is a ton of
> work. :-(

My bet is, and has been for quite a while, that we'll have to go for a
hand-written recursive descent type parser. They can be *substantially*
faster, and performance isn't as affected by the grammar size. And,
about as important, they also allow for a lot more heuristics around
grammar errors - I do think we'll soon have to better than to throw a
generic syntax error for the cases where the grammar doesn't match at
all.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-12-26 18:04:35 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Tom Lane 2018-12-26 17:59:32 Re: "repliation" as database name