Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, John Naylor <jcnaylor(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-26 16:50:18
Message-ID: CA+TgmoaAo01LhyG0apjh6EAvOnvOwKPQ7_FMd0gO1s33p8Q0ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 26, 2018 at 11:22 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think there's a lot of goalpost-moving going on here. The original
> idea was to trim the physical size of the data structure, as stated
> in the thread subject, and just reap whatever cache benefits we got
> along the way from that. I am dubious that we actually have any
> performance problem in this code that needs a big dollop of added
> complexity to fix.

I have seen ScanKeywordLookup show up in profiles quite often and
fairly prominently -- like several percent of total runtime. I'm not
trying to impose requirements on John's patch, and I agree that
reducing the physical size of the structure is a good step whether
anything else is done or not. However, I don't see that as a reason to
shut down further discussion of other possible improvements. If his
patch makes this disappear from profiles, cool, but if it doesn't,
then sooner or later somebody's going to want to do more.

FWIW, my bet is this helps but isn't enough to get rid of the problem
completely. A 9-step binary search has got to be slower than a really
well-optimized hash table lookup. In a perfect world the latter
touches the cache line containing the keyword -- which presumably is
already in cache since we just scanned it -- then computes a hash
value without touching any other cache lines -- and then goes straight
to the right entry. So it touches ONE new cache line. That might a
level of optimization that's hard to achieve in practice, but I don't
think it's crazy to want to get there.

> In my hands, the only part of the low-level parsing code that commonly
> shows up as interesting in profiles is the Bison engine. That's probably
> because the grammar tables are circa half a megabyte and blow out cache
> pretty badly :-(. I don't know of any way to make that better,
> unfortunately. I suspect that it's just going to get worse, because
> people keep submitting additions to the grammar.

I'm kinda surprised that you haven't seen ScanKeywordLookup() in
there, but I agree with you that the size of the main parser tables is
a real issue, and that there's no easy solution. At various times
there has been discussion of using some other parser generator, and
I've also toyed with the idea of writing one specifically for
PostgreSQL. Unfortunately, it seems like bison is all but
unmaintained; the alternatives are immature and have limited adoption
and limited community; and writing something from scratch is a ton of
work. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-12-26 17:00:22 Re: Feature: temporary materialized views
Previous Message Tom Lane 2018-12-26 16:43:36 Re: Shared Memory: How to use SYSV rather than MMAP ?