Quick Links

Re: Explanation for bug #13908: hash joins are badly broken

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Explanation for bug #13908: hash joins are badly broken
Date:	2016-02-06 09:03:14
Message-ID:	56B5B6D2.1010900@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 02/06/2016 02:34 AM, Tom Lane wrote:
> I have sussed what's happening in bug #13908. Basically, commit
> 45f6240a8fa9d355 ("Pack tuples in a hash join batch densely, to save
> memory") broke things for the case where a hash join is using a skew
> table. The reason is that that commit only changed the storage of
> tuples going into the main hash table; tuples going into the skew
> table are still allocated with a palloc apiece, without being put
> into the "chunk" storage. Now, if we're loading the hash table and we
> find that we've exceeded the storage allowed for skew tuples,
> ExecHashRemoveNextSkewBucket wants to push some skew tuples back into
> the main hash table; and it believes that linking such tuples into
> the appropriate main hashbucket chain is sufficient for that. Which
> it was before the aforesaid commit, and still is in simple cases.
> However, if we later try to do ExecHashIncreaseNumBatches, that
> function contains new code that assumes that it can find all tuples
> in the main hashtable by scanning the "chunk" storage directly. Thus,
> the pushed-back tuples are not scanned and are neither re-entered
> into the hash table nor dumped into a batch file. So they won't get
> joined.

Damn, that's an embarrassing oversight :-/

I believe the attached patch should fix this by actually copying the
tuples into the densely allocated chunks. Haven't tested it though, will
do in a few hours.

> It looks like ExecHashIncreaseNumBuckets, if it were to run after
> some executions of ExecHashRemoveNextSkewBucket, would break things
> in the same way. That's not what's happening in this particular test
> case, though.

ExecHashIncreaseNumBuckets assumes all the tuples can be reached by
simply walking the chunks (from dense_alloc). So if removing skew bucket
only updates pointers in buckets, that gets broken. But I don't think
that's a bug in ExecHashIncreaseNumBuckets and should be resolved by
fixing ExecHashRemoveNextSkewBucket.

> I'm of the opinion that this is a stop-ship bug for 9.5.1. Barring
> somebody producing a fix over the weekend, I will deal with it by
> reverting the aforementioned commit.

I think it's not quite possible to revert just the one commit as the
other hashjoin improvements in 9.5 built on top of that. So the revert
would either be quite invasive (requiring more code changes than the
fix), or we'd have to revert all the hashjoin goodies.

FWIW I'm willing to put some time into fixing this over the weekend.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
hashjoin-skew-fix.patch	text/x-patch	950 bytes

In response to

Explanation for bug #13908: hash joins are badly broken at 2016-02-06 01:34:13 from Tom Lane

Responses

Re: Explanation for bug #13908: hash joins are badly broken at 2016-02-06 17:47:56 from Tom Lane
Re: Explanation for bug #13908: hash joins are badly broken at 2016-02-06 21:18:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2016-02-06 12:16:26	Re: silent data loss with ext4 / all current versions
Previous Message	Stéphane Schildknecht	2016-02-06 08:44:41	Re: [Reveiw] Idle In Transaction Session Timeout, revived