Re: Hash support for arrays

From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash support for arrays
Date: 2010-11-03 09:24:16
Message-ID: AANLkTik+zqRBEYzy1X5p_6DnGA=V9BeMS=+HCg_hoSLE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/11/2 Kenneth Marshall <ktm(at)rice(dot)edu>:

> Given that our hash implimentation mixes the input data well (It does.
> I tested it.) then a simple rotate-and-xor method is all that should
> be needed to maintain all of the needed information. The original
> hash function has done the heavy lifting in this case.

Even with the perfect hash function for the elements, certain
combinations of elements could still lead to massive collisions. E.g.,
if repeated values are typical in the input data we are talking about,
then the rotate-and-xor method would still lead to collisions between
any array of the same values of certain lengths, regardless of the
value. In Tom's implementation, as he mentioned before, those
problematical lengths would be multiples of 32 (e.g., an array of 32
1s would collide with an array of 32 2s would collide with an array of
32 3s, etc).

Nicolas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2010-11-03 09:34:30 Re: pgsql: Bootstrap WAL to begin at segment logid=0 logseg=1 (000000010000
Previous Message Marc Cousin 2010-11-03 08:03:21 Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up