Re: accounting for memory used for BufFile during hash joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: accounting for memory used for BufFile during hash joins
Date: 2019-05-07 14:42:36
Message-ID: 27778.1557240156@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Mon, May 06, 2019 at 11:18:28PM -0400, Tom Lane wrote:
>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>>> Do we actually check how many duplicates are there during planning?

>> Certainly that's part of the planner's cost estimates ... but it's
>> only as good as the planner's statistical knowledge.

> I'm looking at the code, and the only place where I see code dealing with
> MCVs (probably the best place for info about duplicate values) is
> estimate_hash_bucketsize in final_cost_hashjoin.

What I'm thinking of is this bit in final_cost_hashjoin:

/*
* If the bucket holding the inner MCV would exceed work_mem, we don't
* want to hash unless there is really no other alternative, so apply
* disable_cost. (The executor normally copes with excessive memory usage
* by splitting batches, but obviously it cannot separate equal values
* that way, so it will be unable to drive the batch size below work_mem
* when this is true.)
*/
if (relation_byte_size(clamp_row_est(inner_path_rows * innermcvfreq),
inner_path->pathtarget->width) >
(work_mem * 1024L))
startup_cost += disable_cost;

It's certainly likely that that logic needs improvement in view of this
discussion --- I was just pushing back on the claim that we weren't
considering the issue at all.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-07 14:50:19 Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Previous Message Tom Lane 2019-05-07 14:35:14 Re: jsonpath