Skip site navigation (1) Skip section navigation (2)

Re: [PATCHES] updated hash functions for postgresql v1

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org,pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] updated hash functions for postgresql v1
Date: 2009-01-27 16:20:49
Message-ID: 20090127162048.GC1961@it.is.rice.edu (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sun, Jan 25, 2009 at 10:27:03PM -0600, Kenneth Marshall wrote:
> On Sat, Jan 10, 2009 at 01:36:25PM -0500, Tom Lane wrote:
> > Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > > I ran 5 times on both old and new code, eliminating the top and bottom
> > > and taking the average of the remaining 3, and I got a 6.9% performance
> > > improvement with the new code.
> > 
> > The question that has been carefully evaded throughout the discussion
> > of this patch is whether the randomness of the hash result is decreased,
> > and if so what is that likely to cost us in performance of the various
> > hash-dependent algorithms.  I would still like to have an answer to that
> > before we make a change to gain marginal performance improvement in
> > the hash function itself (which is already shown to be barely measurable
> > in the total context of a hash-dependent operation...)
> > 
> > 			regards, tom lane
> > 
> 
> Dear Hackers and Reviewers,
> 
> In response to Tom's questioning the randomness of the hash_any
> when the mixing functions are split into two, the first used when
> sequentially processing the input and the second for the final
> mix, I have generated a more detailed analysis of the two hash_any
> functions.
> 
> First, one change to the 11/2008 patch, the keylen is added to a, b and
> c initially so we do not need to add it later on. The is the
> micro-diff:
> -----------------------------
> 
> --- hashfunc.c_TWOMIX   2009-01-22 14:07:34.000000000 -0600
> +++ hashfunc.c_TWOMIX2  2009-01-22 14:17:32.000000000 -0600
> @@ -336,7 +336,6 @@
>  
>                 /* handle the last 11 bytes */
>                 k = (const unsigned char *) ka;
> -               c += keylen;
>  #ifdef WORDS_BIGENDIAN
>                 switch (len)
>                 {
> @@ -439,7 +438,6 @@
>                 }
>  
>                 /* handle the last 11 bytes */
> -               c += keylen;
>  #ifdef WORDS_BIGENDIAN
>                 switch (len)                    /* all the case statements fall through */
>  
> -----------------------------
> 
> The remainder of this document will use the runs from my initial results
> broken out using various power-of-two bucket sizes to simulate our actual
> use in PostgreSQL as the number of items hashed increases and we use more
> and more bits of our hash to identify the appropriate bucket. I have run
> each test twice, once with our current hash_any() with the single mix()
> function and then a second time using my patch from the November commitfest
> plus the patch above to produce a new hash_any() with two separate mixing
> functions mix() and final(). For each run I have generated a sequence of
> unique inputs, up to the number of buckets, hashed them with the hash
> functions (both old and new), then I calculate the expected number of
> collision p(n) using the poisson formula for each number of buckets,
> where the number of buckets are 2**16, 2**18, 2**20, 2**22, 2**24, and
> 2**26. For my initial run, I used a string consisting of the letter 'a'
> followed by the integer representation of the numbers from 0 to the
> (number of buckets - 1):
> 
> 1) "a"uint32 ((i.e. a00001,a0002...)
> Number of buckets: 65536
> Total number of items: 65536
> Expected number with n items: 24109 24109 12054 4018 1004 200 33 4
> Actual number mix():          24044 24172 12078 4036 980 186 30 10
> Actual number mix()/final():  24027 24232 12060 3972 1001 207 31 5 1
> 
> Number of buckets: 262144
> Total number of items: 262144
> Expected number with n items: 96437 96437 48218 16072 4018 803 133 19 2
> Actual number mix():          96224 96730 48240 15951 4094 744 143 17 1
> Actual number mix()/final():  96335 96646 48071 16128 4018 801 122 21 2
> 
> Number of buckets: 1048576
> Total number of items: 1048576
> Expected number with n items: 385749 385749 192874 64291 16072 3214 535 76 9
> Actual number mix():          385716 385596 193243 64115 16053 3285 478 77 12 1
> Actual number mix()/final():  385955 385016 193789 63768 16259 3190 511 79 8 1
> 
> Number of buckets: 4194304
> Total number of items: 4194304
> Expected number with n items: 1542998 1542998 771499 257166 64291 12858 2143 306 38
> Actual number mix():          1542536 1543489 771351 257777 63830 12847 2123 326 19 5 1
> Actual number mix()/final():  1541828 1544429 772072 256178 64579 12774 2129 288 22 5
> 
> Number of buckets: 16777216
> Total number of items: 16777216
> Expected number with n items: 6171992 6171992 3085996 1028665 257166 51433 8572 1224 153
> Actual number mix():          6170866 6174079 3085912 1027140 257808 51385 8638 1219 146 23
> Actual number mix()/final():  6172058 6171991 3086279 1027916 257535 51465 8554 1243 149 23 3
> 
> Number of buckets: 67108864
> Total number of items: 67108864
> Expected number with n items: 24687971 24687971 12343985 4114661 1028665 205733 34288 4898 612
> Actual number mix():          24686110 24690897 12344232 4113515 1028232 205682 34546 4942 629 72 7
> Actual number mix()/final():  24708515 24669248 12333034 4114796 1034256 208424 34888 5023 598 77 5
> 
> Here is a second run with number of items = (number of buckets)/2:
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix():          39704 19978 4898 842 103 10 1
> Actual number mix()/final():  39715 19922 4991 779 119 9 1
> 
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix():          158753 79972 19703 3216 458 38 4
> Actual number mix()/final():  158869 79688 19853 3303 393 33 3 2
> 
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix():          636118 317839 79388 13435 1628 152 16
> Actual number mix()/final():  636102 317824 79527 13267 1683 161 12
> 
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix():          2544529 1270639 318904 53005 6516 645 61 5
> Actual number mix()/final():  2544014 1271483 318720 52849 6559 630 47 2
> 
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix():          10174997 5089736 1271355 211528 26679 2683 220 17 1
> Actual number mix()/final():  10176567 5087207 1271789 212014 26684 2703 235 16 1
> 
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix():          40703033 20353345 5086252 848860 105885 10536 891 59 3
> Actual number mix()/final():  40770874 20250828 5096890 865448 111835 11890 1013 76 10
> 
> 2) uint32uint32 (i.e. uint64)
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix():          39770 19863 4947 822 125 9
> Actual number mix()/final():  39754 19852 4990 837 91 11 1
> 
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix():          159046 79464 19823 3334 430 43 3 1
> Actual number mix()/final():  158879 79732 19774 3301 406 47 5
> 
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix():          636275 317580 79514 13357 1661 172 14 3
> Actual number mix()/final():  635578 318574 79504 13162 1585 159 13 1
> 
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix():          2543769 1272196 318067 53050 6497 672 48 4 1
> Actual number mix()/final():  2543014 1273485 317812 52703 6589 635 59 7
> 
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix():          10174587 5090467 1270909 211836 26513 2679 207 18
> Actual number mix()/final():  10175142 5089481 1270919 212555 26234 2643 224 15 3
> 
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix():          40706281 20347746 5088639 848133 106388 10669 946 60 2
> Actual number mix()/final():  40701438 20354677 5088549 846570 106157 10578 838 55 2
> 
> 3) uint32 from 1-1648379
> Number of buckets: 65536
> Total number of items: 32768
> Expected number with n items: 39749 19874 4968 828 103 10
> Actual number mix():          39758 19844 4993 835 97 9
> Actual number mix()/final():  39842 19712 5016 849 109 7 1
> 
> Number of buckets: 262144
> Total number of items: 131072
> Expected number with n items: 158998 79499 19874 3312 414 41 3
> Actual number mix():          158973 79523 19900 3283 426 38 1
> Actual number mix()/final():  159136 79293 19896 3346 421 48 3 1
> 
> Number of buckets: 1048576
> Total number of items: 524288
> Expected number with n items: 635993 317996 79499 13249 1656 165 13
> Actual number mix():          636083 317917 79484 13183 1711 180 16 2
> Actual number mix()/final():  636183 317772 79438 13305 1685 173 20
> 
> Number of buckets: 4194304
> Total number of items: 2097152
> Expected number with n items: 2543973 1271986 317996 52999 6624 662 55 3
> Actual number mix():          2544325 1271367 318260 52950 6660 683 54 4 1
> Actual number mix()/final():  2544022 1272059 317778 53060 6646 665 70 4
> 
> Number of buckets: 16777216
> Total number of items: 8388608
> Expected number with n items: 10175895 5087947 1271986 211997 26499 2649 220 15
> Actual number mix():          10176360 5087249 1272083 212010 26655 2629 212 18
> Actual number mix()/final():  10175315 5088617 1272068 212144 26224 2589 234 23 1 1
> 
> Number of buckets: 67108864
> Total number of items: 33554432
> Expected number with n items: 40703583 20351791 5087947 847991 105998 10599 883 63 3
> Actual number mix():          40704034 20350542 5088760 848309 105721 10520 894 77 7
> Actual number mix()/final():  40774785 20243786 5098862 866801 111865 11641 1019 100 5
> 
> 4) cracklib
> Number of buckets: 65536
> Total number of items: 32767
> Expected number with n items: 39750 19874 4968 828 103 10
> Actual number mix():          39731 19858 5049 794 92 11 1
> Actual number mix()/final():  39785 19801 5011 823 106 9 1
> 
> Number of buckets: 262144
> Total number of items: 131071
> Expected number with n items: 158998 79498 19874 3312 414 41 3
> Actual number mix():          159077 79420 19806 3380 409 49 3
> Actual number mix()/final():  159020 79475 19845 3366 383 54 1
> 
> Number of buckets: 1048576
> Total number of items: 524287
> Expected number with n items: 635994 317996 79498 13249 1656 165 13
> Actual number mix():          635979 318066 79420 13281 1641 163 23 3
> Actual number mix()/final():  635694 318637 79122 13271 1686 150 13 3
> 
> Number of buckets: 4194304
> Total number of items: 1648379
> Expected number with n items: 2831263 1112698 218647 28643 2814 221 14
> Actual number mix():          2830769 1113458 218633 28396 2786 250 11 1
> Actual number mix()/final():  2831304 1113004 218002 28843 2926 212 13
> 
> Number of buckets: 16777216
> Total number of items: 1648379
> Expected number with n items: 15207226 1494125 73399 2403 59 1
> Actual number mix():          15206923 1494718 73129 2384 59 3
> Actual number mix()/final():  15207183 1494267 73250 2452 64
> 
> Number of buckets: 67108864
> Total number of items: 1648379
> Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08
> Actual number mix():          65480389 1608727 19593 154 1
> Actual number mix()/final():  65480455 1608609 19631 168 1
> 
> 5) cracklib x 100
> Number of buckets: 65536
> Total number of items: 32767
> Expected number with n items: 39750 19874 4968 828 103 10
> Actual number mix():          39829 19742 5003 847 99 14 2
> Actual number mix()/final():  39783 19794 5023 828 97 11
> 
> Number of buckets: 262144
> Total number of items: 131071
> Expected number with n items: 158998 79498 19874 3312 414 41 3
> Actual number mix():          159136 79242 20007 3289 405 62 3
> Actual number mix()/final():  158961 79546 19905 3270 408 51 3
> 
> Number of buckets: 1048576
> Total number of items: 524287
> Expected number with n items: 635994 317996 79498 13249 1656 165 13
> Actual number mix():          635815 318331 79372 13218 1658 168 12 2
> Actual number mix()/final():  635986 318152 79309 13231 1687 190 21
> 
> Number of buckets: 4194304
> Total number of items: 1648379
> Expected number with n items: 2831263 1112698 218647 28643 2814 221 14
> Actual number mix():          2830891 1113468 218202 28712 2804 208 18 1
> Actual number mix()/final():  2831786 1111616 219209 28661 2816 199 16 1
> 
> Number of buckets: 16777216
> Total number of items: 1648379
> Expected number with n items: 15207226 1494125 73399 2403 59 1
> Actual number mix():          15207428 1493777 73492 2459 59 1
> Actual number mix()/final():  15207777 1493063 73877 2435 63 1
> 
> Number of buckets: 67108864
> Total number of items: 1648379
> Expected number with n items: 65480564 1608383 19753 161 1.47989202515749e-08
> Actual number mix():          65480463 1608595 19635 170 1
> Actual number mix()/final():  65480650 1608222 19820 171 1
> 
> As you can see, all of the probabilities for both the current single mix()
> function and the split mix()/final() functions match the theoretical results
> as calculated by the poisson formula. Here is a nice reference that I found
> online describing the testing procedure that I am using:
> 
> http://rabbit.eng.miami.edu/class/een318/poisson.pdf
> 
> I can find no situations where the current version of hash_any() performs
> statistically better than the version resulting from my split mix()/final()
> function patch from 11/2008. I would be happy to share the test harness with
> anyone who wishes to test further. Please let me know if there is anything
> I can do. I still recommend applying the split mixing function patch that
> I submitted for the 11/2008 commitfest and I think these tests should allay
> Tom's concerns about the randomness of the new hash function.
> 
> Regards,
> Ken Marshall
> 
Dear Hackers,

I have updated the patch posted by Jeff Davis on January 9th
to include the micro-patch above as well as updated the polymorphism
regressions tests. This applies cleanly to the latest CVS pull.

Regards,
Ken

In response to

Responses

pgsql-hackers by date

Next:From: Simon RiggsDate: 2009-01-27 16:21:19
Subject: Re: 8.4 release planning
Previous:From: Gregory StarkDate: 2009-01-27 16:16:22
Subject: Re: More FOR UPDATE/FOR SHARE problems

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group