Re: [PATCHES] updated hash functions for postgresql v1

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] updated hash functions for postgresql v1
Date: 2009-01-27 16:20:49
Message-ID: 20090127162048.GC1961@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 25, 2009 at 10:27:03PM -0600, Kenneth Marshall wrote:
> On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote:
> > Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > > I ran 5 times on both old and new code, eliminating the top and bottom
> > > and taking the average of the remaining 3, and I got a 6.9% performance
> > > improvement with the new code.
> >
> > The question that has been carefully evaded throughout the discussion
> > of this patch is whether the randomness of the hash result is decreased,
> > and if so what is that likely to cost us in performance of the various
> > hash-dependent algorithms. I would still like to have an answer to that
> > before we make a change to gain marginal performance improvement in
> > the hash function itself (which is already shown to be barely measurable
> > in the total context of a hash-dependent operation...)
> >
> > regards, tom lane
> >
>
> Dear Hackers and Reviewers,
>
> In response to Tom's questioning the randomness of the hash_any
> when the mixing functions are split into two, the first used when
> sequentially processing the input and the second for the final
> mix, I have generated a more detailed analysis of the two hash_any
> functions.
>
> First, one change to the 11/2008 patch, the keylen is added to a, b and
> c initially so we do not need to add it later on. The is the
> micro-diff:
> -----------------------------
>
> --- hashfunc.c_TWOMIX 2009-01-22 14:07:34.000000000 -0600
> +++ hashfunc.c_TWOMIX2 2009-01-22 14:17:32.000000000 -0600
> @@ -336,7 +336,6 @@
>
> /* handle the last 11 bytes */
> k = (const unsigned char *) ka;
> - c += keylen;
> #ifdef WORDS_BIGENDIAN
> switch (len)
> {
> @@ -439,7 +438,6 @@
> }
>
> /* handle the last 11 bytes */
> - c += keylen;
> #ifdef WORDS_BIGENDIAN
> switch (len) /* all the case statements fall through */
>
> -----------------------------
>
> The remainder of this document will use the runs from my initial results
> broken out using various power-of-two bucket sizes to simulate our actual
> use in PostgreSQL as the number of items hashed increases and we use more
> and more bits of our hash to identify the appropriate bucket. I have run
> each test twice, once with our current hash_any() with the single mix()
> function and then a second time using my patch from the November commitfest
> plus the patch above to produce a new hash_any() with two separate mixing
> functions mix() and final(). For each run I have generated a sequence of
> unique inputs, up to the number of buckets, hashed them with the hash
> functions (both old and new), then I calculate the expected number of
> collision p(n) using the poisson formula for each number of buckets,
> where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and
> 2**26. For my initial run, I used a string consisting of the letter 'a'
> followed by the integer representation of the numbers from 0 to the
> (number of buckets - 1):
>
> 1) "a"uint32 ((i.e. a00001,a0002...)
> Number of buckets: 65536
> Total number of items: 65536
> Expected number with n items: 24109 24109 12054 4018 1004 200 33 4
> Actual number mix(): 24044 24172 12078 4036 980 186 30 10
> Actual number mix()/final(): 24027 24232 12060 3972 1001 207 31 5 1
>
> Number of buckets: 262144
> Total number of items: 262144
> Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2
> Actual number mix(): 96224 96730 48240 15951 4094 744 143 17 1
> Actual number mix()/final(): 96335 96646 48071 16128 4018 801 122 21 2
>
> Number of buckets: 1048576
> Total number of items: 1048576
> Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9
> Actual number mix(): 385716 385596 193243 64115 16053 3285 478 77 12 1
> Actual number mix()/final(): 385955 385016 193789 63768 16259 3190 511 79 8 1
>
> Number of buckets: 4194304
> Total number of items: 4194304
> Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 306 38
> Actual number mix(): 1542536 1543489 771351 257777 63830 12847 2123 326 19 5 1
> Actual number mix()/final(): 1541828 1544429 772072 256178 64579 12774 2129 288 22 5
>
> Number of buckets: 16777216
> Total number of items: 16777216
> Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 1224 153
> Actual number mix(): 6170866 6174079 3085912 1027140 257808 51385 8638 1219 146 23
> Actual number mix()/final(): 6172058 6171991 3086279 1027916 257535 51465 8554 1243 149 23 3
>
> Number of buckets: 67108864
> Total number of items: 67108864
> Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 34288 4898 612
> Actual number mix(): 24686110 24690897 12344232 4113515 1028232 205682 34546 4942 629 72 7
> Actual number mix()/final(): 24708515 24669248 12333034 4114796 1034256 208424 34888 5023 598 77 5
>
> Here is a second run with number of items = (number of buckets)/2:
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix(): 39704 19978 4898 842 103 10 1
> Actual number mix()/final(): 39715 19922 4991 779 119 9 1
>
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix(): 158753 79972 19703 3216 458 38 4
> Actual number mix()/final(): 158869 79688 19853 3303 393 33 3 2
>
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix(): 636118 317839 79388 13435 1628 152 16
> Actual number mix()/final(): 636102 317824 79527 13267 1683 161 12
>
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix(): 2544529 1270639 318904 53005 6516 645 61 5
> Actual number mix()/final(): 2544014 1271483 318720 52849 6559 630 47 2
>
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix(): 10174997 5089736 1271355 211528 26679 2683 220 17 1
> Actual number mix()/final(): 10176567 5087207 1271789 212014 26684 2703 235 16 1
>
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix(): 40703033 20353345 5086252 848860 105885 10536 891 59 3
> Actual number mix()/final(): 40770874 20250828 5096890 865448 111835 11890 1013 76 10
>
> 2) uint32uint32 (i.e. uint64)
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix(): 39770 19863 4947 822 125 9
> Actual number mix()/final(): 39754 19852 4990 837 91 11 1
>
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix(): 159046 79464 19823 3334 430 43 3 1
> Actual number mix()/final(): 158879 79732 19774 3301 406 47 5
>
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix(): 636275 317580 79514 13357 1661 172 14 3
> Actual number mix()/final(): 635578 318574 79504 13162 1585 159 13 1
>
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix(): 2543769 1272196 318067 53050 6497 672 48 4 1
> Actual number mix()/final(): 2543014 1273485 317812 52703 6589 635 59 7
>
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix(): 10174587 5090467 1270909 211836 26513 2679 207 18
> Actual number mix()/final(): 10175142 5089481 1270919 212555 26234 2643 224 15 3
>
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix(): 40706281 20347746 5088639 848133 106388 10669 946 60 2
> Actual number mix()/final(): 40701438 20354677 5088549 846570 106157 10578 838 55 2
>
> 3) uint32 from 1-1648379
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix(): 39758 19844 4993 835 97 9
> Actual number mix()/final(): 39842 19712 5016 849 109 7 1
>
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix(): 158973 79523 19900 3283 426 38 1
> Actual number mix()/final(): 159136 79293 19896 3346 421 48 3 1
>
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix(): 636083 317917 79484 13183 1711 180 16 2
> Actual number mix()/final(): 636183 317772 79438 13305 1685 173 20
>
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix(): 2544325 1271367 318260 52950 6660 683 54 4 1
> Actual number mix()/final(): 2544022 1272059 317778 53060 6646 665 70 4
>
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix(): 10176360 5087249 1272083 212010 26655 2629 212 18
> Actual number mix()/final(): 10175315 5088617 1272068 212144 26224 2589 234 23 1 1
>
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix(): 40704034 20350542 5088760 848309 105721 10520 894 77 7
> Actual number mix()/final(): 40774785 20243786 5098862 866801 111865 11641 1019 100 5
>
> 4) cracklib
> Number of buckets: 65536
> Total number of items: 32767
> Expected number with n items: 39750 19874 4968 828 103 10
> Actual number mix(): 39731 19858 5049 794 92 11 1
> Actual number mix()/final(): 39785 19801 5011 823 106 9 1
>
> Number of buckets: 262144
> Total number of items: 131071
> Expected number with n items: 158998 79498 19874 3312 414 41 3
> Actual number mix(): 159077 79420 19806 3380 409 49 3
> Actual number mix()/final(): 159020 79475 19845 3366 383 54 1
>
> Number of buckets: 1048576
> Total number of items: 524287
> Expected number with n items: 635994 317996 79498 13249 1656 165 13
> Actual number mix(): 635979 318066 79420 13281 1641 163 23 3
> Actual number mix()/final(): 635694 318637 79122 13271 1686 150 13 3
>
> Number of buckets: 4194304
> Total number of items: 1648379
> Expected number with n items: 2831263 1112698 218647 28643 2814 221 14
> Actual number mix(): 2830769 1113458 218633 28396 2786 250 11 1
> Actual number mix()/final(): 2831304 1113004 218002 28843 2926 212 13
>
> Number of buckets: 16777216
> Total number of items: 1648379
> Expected number with n items: 15207226 1494125 73399 2403 59 1
> Actual number mix(): 15206923 1494718 73129 2384 59 3
> Actual number mix()/final(): 15207183 1494267 73250 2452 64
>
> Number of buckets: 67108864
> Total number of items: 1648379
> Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08
> Actual number mix(): 65480389 1608727 19593 154 1
> Actual number mix()/final(): 65480455 1608609 19631 168 1
>
> 5) cracklib x 100
> Number of buckets: 65536
> Total number of items: 32767
> Expected number with n items: 39750 19874 4968 828 103 10
> Actual number mix(): 39829 19742 5003 847 99 14 2
> Actual number mix()/final(): 39783 19794 5023 828 97 11
>
> Number of buckets: 262144
> Total number of items: 131071
> Expected number with n items: 158998 79498 19874 3312 414 41 3
> Actual number mix(): 159136 79242 20007 3289 405 62 3
> Actual number mix()/final(): 158961 79546 19905 3270 408 51 3
>
> Number of buckets: 1048576
> Total number of items: 524287
> Expected number with n items: 635994 317996 79498 13249 1656 165 13
> Actual number mix(): 635815 318331 79372 13218 1658 168 12 2
> Actual number mix()/final(): 635986 318152 79309 13231 1687 190 21
>
> Number of buckets: 4194304
> Total number of items: 1648379
> Expected number with n items: 2831263 1112698 218647 28643 2814 221 14
> Actual number mix(): 2830891 1113468 218202 28712 2804 208 18 1
> Actual number mix()/final(): 2831786 1111616 219209 28661 2816 199 16 1
>
> Number of buckets: 16777216
> Total number of items: 1648379
> Expected number with n items: 15207226 1494125 73399 2403 59 1
> Actual number mix(): 15207428 1493777 73492 2459 59 1
> Actual number mix()/final(): 15207777 1493063 73877 2435 63 1
>
> Number of buckets: 67108864
> Total number of items: 1648379
> Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08
> Actual number mix(): 65480463 1608595 19635 170 1
> Actual number mix()/final(): 65480650 1608222 19820 171 1
>
> As you can see, all of the probabilities for both the current single mix()
> function and the split mix()/final() functions match the theoretical results
> as calculated by the poisson formula. Here is a nice reference that I found
> online describing the testing procedure that I am using:
>
> http://rabbit.eng.miami.edu/class/een318/poisson.pdf
>
> I can find no situations where the current version of hash_any() performs
> statistically better than the version resulting from my split mix()/final()
> function patch from 11/2008. I would be happy to share the test harness with
> anyone who wishes to test further. Please let me know if there is anything
> I can do. I still recommend applying the split mixing function patch that
> I submitted for the 11/2008 commitfest and I think these tests should allay
> Tom's concerns about the randomness of the new hash function.
>
> Regards,
> Ken Marshall
>
Dear Hackers,

I have updated the patch posted by Jeff Davis on January 9th
to include the micro-patch above as well as updated the polymorphism
regressions tests. This applies cleanly to the latest CVS pull.

Regards,
Ken

Attachment Content-Type Size
new_hashpatch.diff text/plain 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-01-27 16:21:19 Re: 8.4 release planning
Previous Message Gregory Stark 2009-01-27 16:16:22 Re: More FOR UPDATE/FOR SHARE problems