Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joerg Sonnenberger <joerg(at)bec(dot)de>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date: 2019-01-08 00:36:58
Message-ID: 10021.1546907818@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Probably there's a lot to be criticized about the Perl style below;
> anybody feel a need to rewrite it?

Here's a somewhat better version. I realized that I was being too
slavishly tied to the data structures used in the C version; in Perl
it's easier to manage the lists of edges as hashes. I can't see any
need to distinguish left and right edge sets, either, so this just
has one such hash per vertex.

Also, it seems to me that we *can* make intelligent use of unused
hashtable entries to exit early on many non-keyword inputs. The reason
the existing code fails to do so is that it computes the sums and
differences of hashtable entries in unsigned modulo arithmetic; but if
we make the hashtable entries signed, we can set them up as exact
differences and drop the final modulo operation in the hash function.
Then, any out-of-range sum must indicate an input that is not a keyword
(because it is touching a pair of hashtable entries that didn't go
together) and we can exit early from the caller. This in turn lets us
mark unused hashtable entries with large values to ensure that sums
involving them will be out of range.

A weak spot in that argument is that it's not entirely clear how
large the differences can get --- with an unlucky series of
collisions, maybe they could get large enough to overflow int16?
I don't think that's likely for the size of problem this script
is going to encounter, so I just put in an error check for it.
But it could do with closer analysis before deciding that this
is a general-purpose solution.

regards, tom lane

Attachment Content-Type Size
perfect-hash-keyword-lookup-2.patch text/x-diff 19.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-08 00:37:51 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Andres Freund 2019-01-08 00:31:59 Re: Displaying and dumping of table access methods