Re: Avoiding OOM in a hash join with many duplicate inner keys

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, David Hinkle <hinkle(at)cipafilter(dot)com>
Subject: Re: Avoiding OOM in a hash join with many duplicate inner keys
Date: 2017-02-16 20:25:54
Message-ID: CA+Tgmobptvurm0Kdx3OWN-ogo-0_V2_30cvyiBpyP7RG9f128g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 16, 2017 at 2:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> This might be overly aggressive, because it will pretty much shut off
>>> any attempt to use hash joining on a large inner relation unless we
>>> have statistics for it (and those stats are favorable). But having
>>> seen this example, I think we need to be worried.
>
>> I do think that's worrying, but on the other hand it seems like this
>> solution could disable many hash joins that would actually be fine. I
>> don't think the largest ndistinct estimates we ever generate are very
>> large, and therefore this seems highly prone to worry even when
>> worrying isn't really justified.
>
> I initially thought about driving the shutoff strictly from the estimate
> of the MCV frequency, without involving the more general ndistinct
> computation that estimate_hash_bucketsize does. I'm not sure how much
> that would do for your concern, but at least the MCV frequency doesn't
> involve quite as much extrapolation as ndistinct.

Hmm, so we could do something like: if the estimated frequency of the
least-common MCV is enough to make one bucket overflow work_mem, then
don't use a hash join? That would still be prone to some error (in
both directions, really) but it seems less likely to spit out
completely stupid results than relying on ndistinct, which never gets
very big even in a 10TB table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-16 20:31:07 Re: Parallel bitmap heap scan
Previous Message Tom Lane 2017-02-16 19:38:19 Re: Avoiding OOM in a hash join with many duplicate inner keys