Re: bad estimation together with large work_mem generates terrible slow hash joins

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bad estimation together with large work_mem generates terrible slow hash joins
Date: 2014-09-12 16:49:15
Message-ID: CA+TgmoZbF_oE5aK_x9DKP5U5ZzfU3+bc3C10brO4HauJCm4h-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 12, 2014 at 8:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Sep 11, 2014 at 5:57 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> Attached is the patch split as suggested:
>>
>> (a) hashjoin-nbuckets-v14a-size.patch
>>
>> * NTUP_PER_BUCKET=1
>> * counting buckets towards work_mem
>> * changes in ExecChooseHashTableSize (due to the other changes)
>
> OK, I'm going to work on this one now. I have some ideas about how to
> simplify this a bit and will post an update in a few hours once I see
> how that pans out.

Here's what I came up with. I believe this has the same semantics as
your patch for less code, but please check, because I might be wrong.
The main simplifications I made were:

(1) In master, we decide whether or not to batch based on the size of
the data, without knowing the number of buckets. If we want to
account for the bucket overhead, that doesn't work. But in your
patch, we basically computed the number of buckets we'd need for the
single-batch case, then decide whether to do a single batch, and if
yes, computed the same values over again with the same formula phrased
differently. I revised that so that the single-batch case is
calculated only once. If everything fits, we're all set, and don't
need to recalculate anything.

(2) In your patch, once we decide we need to batch, you set nbatch as
now, and then check whether the computed value is still adequate after
subtracting buckets_byte from hash_table_bytes. I revised that so
that we make the initial batch estimate of dbatch by dividing
inner_rel_bytes by hash_table_bytes - bucket_bytes rather than
hash_table_bytes. I think that avoids the need to maybe increase
nbatch again afterwards.

I'm comfortable with this version if you are, but (maybe as a
follow-on commit) I think we could make this even a bit smarter. If
inner_rel_bytes + bucket_bytes > hash_table_bytes but inner_rel_bytes
+ bucket_bytes / 4 < hash_table_bytes, then reduce the number of
buckets by 2x or 4x to make it fit. That would provide a bit of
additional safety margin against cases where this patch might
unluckily lose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
hashjoin-lf1-rmh.patch text/x-patch 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-09-12 16:52:11 Re: jsonb format is pessimal for toast compression
Previous Message Pavel Stehule 2014-09-12 16:40:09 Re: expanded mode is still broken