Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, John Naylor <jcnaylor(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2018-12-26 19:21:05
Message-ID: 5300.1545852065@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-12-26 10:45:11 -0500, Robert Haas wrote:
>> I'm not sure that I understand quite what you have in mind for a
>> serialized non-perfect hashtable. Are you thinking that we'd just
>> construct a simplehash and serialize it?

> I was basically thinking that we'd have the perl script implement a
> simple hash and put the keyword (pointers) into an array, handling
> conflicts with the simplest linear probing thinkable. As there's never a
> need for modifications, that ought to be fairly simple.

I think it was Knuth who said that when you use hashing, you are putting
a great deal of faith in the average case, because the worst case is
terrible. The applicability of that to this problem is that if you hit
a bad case (say, a long collision chain affecting some common keywords)
you could end up with poor performance that affects a lot of people for
a long time. And our keyword list is not so static that you could prove
once that the behavior is OK and then forget about it.

So I'm suspicious of proposals to use simplistic hashing here.

There might well be some value in Robert's idea of keying off the first
letter to get rid of the first few binary-search steps, not least because
those steps are particularly terrible from a cache-footprint perspective.
I'm not sold on doing anything significantly more invasive than that.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-26 19:31:06 Re: random() (was Re: New GUC to sample log queries)
Previous Message Peter Geoghegan 2018-12-26 19:08:54 Re: random() (was Re: New GUC to sample log queries)