Re: NAMEDATALEN Changes

From: Greg Copeland <greg(at)CopelandConsulting(dot)Net>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: NAMEDATALEN Changes
Date: 2002-02-14 14:00:58
Message-ID: CCC.200202141401.g1EE1Dk24399@CopelandConsulting.Net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 13 February 2002 23:59, Neil Conway wrote:
> On Wed, 2002-02-13 at 20:00, Tom Lane wrote:

[perf hit comment removed]

>
> I've attached a pretty trivial patch that implements this. Instead of
> automatically hashing NAMEDATALEN bytes, hashname() uses only strlen()
> bytes: this should improve both the common case (small identifers, 5-10
> characters long), as well as reduce the penalty when NAMEDATALEN is
> increased. The patch passes the regression tests, FWIW. I didn't remove
> cc_hashname() -- I'll tackle that tomorrow unless anyone objects...
>
> I also did some pretty simple benchmarks; however, I'd appreciate it
> anyone could confirm these results.
>

Please bare with me on this as this is my first posting having any real
content.  Please don't hang me out if I've overlooked anything and I'm
pointing out that I'm making a rather large assumption. Please correct as
needed.

The primary assumption is that the actual key lengths can be less than
NAMEDATALEN. That is, if the string, "shortkey" is a valid input key (??)
which provides a key length of 8-bytes as input to the hash_any() function
even though NAMEDATALEN may be something like 128 or larger. If this
assumption is correct, then wouldn't increasing the default input key size
(NAMEDATALEN) beyond the maximum actual key length be a bug? That is to say,
if we have a key with only 8-bytes of data and we iterrate over 128-bytes,
wouldn't the resulting hash be arbitrary and invalid as it would be hashing
memory which is not reflective of the key being hashed?

If my assumptions are correct, then it sounds like using the strlen()
implementation (assuming input keys are valid C-strings) is really the proper
implementation short of using an adjusted min(NAMEDATALEN,strlen()) type
approach.

[snip - var NAMEDATALEN benchmark results]

Greg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8a8Mg4lr1bpbcL6kRAlaxAJ47CO+ExL/ZMo/i6LDoetXrul9qqQCfQli3
AvqN6RJjSuAH/p/mpZ8J4JY=
=wnVM
-----END PGP SIGNATURE-----

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-02-14 14:57:50 Re: NAMEDATALEN Changes
Previous Message Jean-Michel POURE 2002-02-14 09:16:32 Re: alter table drop column status