Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Todd A(dot) Cook" <tcook(at)blackducksoftware(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
Date: 2017-12-10 22:09:42
Message-ID: 7de4b2d6-b119-7d15-04d8-4c0fcb29cbc3@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


On 12/06/2017 06:19 PM, Andres Freund wrote:
> On 2017-12-06 12:14:24 -0500, Tom Lane wrote:
>> Checking out hashint8() on random data shows no such obvious fault.
>> You've apparently got a data set that exposes a weakness in hashint8,
>> but it's not very clear what that is.
>
> It's intentionally designed to cause problems afaict:
>
> http://archives.postgresql.org/message-id/861b9f1f-cdc0-bc49-2595-80bc39c37dc3%40blackducksoftware.com
>
>
>> In any case, the hashtable code needs to not fall over in the
>> presence of a lot of collisions, regardless of the exact reason
>> for there being a lot.
>
> Yes, we need to be more resilient about it. Working on a patch.
>

I've been playing with this a bit today (sorry for trespassing on your
turf a bit, but curiosity killed the cat), particularly with the tree
basic ideas mentioned in this thread. Let me share some numbers.

1) randomized hashing, i.e. using the extended hash functions and a
random seed

2) universal hashing [1], i.e. passing the hash through ((a*h+b) mod p)
formula, where h is "our" hash, and (a,b) are random and p is a prime.
This can't possibly fix the collisions reported by Todd, because the
result will remain the same. But it should fix "my" chains of values by
spreading them a bit (and the a,b,p values are generated on each grow,
so in the unlikely case where we get a chain, it should disappear after
a growth)

[1] https://en.wikipedia.org/wiki/Universal_hashing

3) disabling growth of the hash table when the fillfactor would drop too
much (e.g. below 0.2)

Very rough patches for these three approaches are attached.

Now, some very rough numbers comparing the impact of those approaches
(and their combinations). The following two tables present timings for
for a couple of datasets - five cases without any collisions (allowing
us to quantify impact on regular datasets), the two datasets generated
by me (one without collisions, one with collisions), and the two issues
reported by Todd.

RH - randomized hashing (seed + extended)
UH - universal hashing
ST - stop growing

master RH UH ST RH UH ST
-------------------------------------------------------
Ok/1 542 540 552 537 0% 2% -1%
Ok/2 232 221 242 237 -5% 4% 2%
Ok/3 161 167 180 161 4% 12% 0%
Ok/4 276 270 285 274 -2% 3% -1%
Ok/5 174 179 193 174 3% 11% 0%
Tomas/1 543 555 551 533 2% 1% -2%
Tomas/2 OOM 2322 3251 -
Todd/1 OOM OOM 63 63
Todd/2 OOM OOM OOM 72

This is mostly as expected, although I'm surprised that the universal
hashing seems to fix Todd's first reproducer. The "stop growing"
approach technically fixes "my" dataset, but it takes so much time I
haven't timed it.

In terms of slowdown (compared to master), I'd say RH and ST are mostly
neutral. Universal hashing has significant impact in some cases, but I
suppose that might be fixed by using a slightly different scheme
(instead of using simple modulo arithmetic).

Now, let's see on combinations of the approaches (broken into two
tables, so that it doesn't get mangled by mail clients):

master RH+UH RH+ST UH+ST RH+UH+ST
Ok/1 542 553 545 549 558
Ok/2 232 241 234 241 250
Ok/3 161 180 170 180 187
Ok/4 276 274 270 285 290
Ok/5 174 187 182 193 198
Tomas/1 543 571 561 554 570
Tomas/2 - 2380 2340 3130 2380
Todd/1 - 61 65 63 64
Todd/2 - - 90 91 93

RH+UH RH+ST UH+ST RH+UH+ST
Ok/1 2% 1% 1% 3%
Ok/2 4% 1% 4% 8%
Ok/3 12% 6% 12% 16%
Ok/4 -1% -2% 3% 5%
Ok/5 7% 5% 11% 14%
Tomas/1 5% 3% 2% 5%

It seems only the "stop growth" has to be part of the solution, as it's
included in any combination fixing all cases shared in this thread. I'd
say (randomized hashing + stop growing) seems like the best option.

FWIW I do agree the data sets shared in this thread are pretty extreme
and it doesn't make much sense to slow the regular cases. I'll be
perfectly happy if we stop the OOM, making those cases fast is a bonus.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-use-random-seed-for-hash-table.patch text/x-patch 5.5 KB
0002-universal-hashing-to-compute-initial-bucket.patch text/x-patch 2.0 KB
0003-disable-unbounded-hashtable-growth.patch text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Devrim Gündüz 2017-12-10 22:13:37 Re: BUG #14957: service initdb fails (initdbcmd run in background)
Previous Message benthorner 2017-12-10 16:53:01 BUG #14957: service initdb fails (initdbcmd run in background)