Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bryce Cutt <pandasuit(at)gmail(dot)com>
Cc: Joshua Tolley <eggyknap(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Date: 2009-03-06 18:57:53
Message-ID: 2146.1236365873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bryce Cutt <pandasuit(at)gmail(dot)com> writes:
> Here is the new patch.
> Our experiments show no noticeable performance issue when using the
> patch for cases where the optimization is not used because the number
> of extra statements executed when the optimization is disabled is
> insignificant.

> We have updated the patch to remove a couple of if statements, but
> this is really minor. The biggest change was to MultiExecHash that
> avoids an if check per tuple by duplicating the hashing loop.

I think you missed the point of the performance questions. It wasn't
about avoiding extra simple if-tests in the per-tuple loops; a few of
those are certainly not going to add measurable cost given how complex
the code is already. (I really don't think you should be duplicating
hunks of code to avoid adding such tests.) Rather, the concern was that
if we are dedicating a fraction of available work_mem to this purpose,
that reduces the overall efficiency of the regular non-IM code path,
principally by forcing the creation of more batches than would otherwise
be needed. It's not clear whether the savings for IM tuples always
exceeds this additional cost.

After looking over the code a bit, there are two points that
particularly concern me in this connection:

* The IM hashtable is only needed during the first-batch processing;
once we've completed the first pass over the outer relation there is
no longer any need for it, unless I'm misunderstanding things
completely. Therefore it really only competes for space with the
regular first batch. However the damage to nbatches will already have
been done; in effect, we can expect that each subsequent batch will
probably only use (100 - IM_WORK_MEM_PERCENT)% of work_mem. The patch
seems to try to deal with this by keeping IM_WORK_MEM_PERCENT negligibly
small, but surely that's mostly equivalent to fighting with one hand
tied behind your back. I wonder if it'd be better to dedicate all of
work_mem to the MCV hash values during the first pass, rather than
allowing them to compete with the first regular batch.

* The IM hashtable creates an additional reason why nbatch might
increase during the initial scan of the inner relation; in fact, since
it's an effect not modeled in the initial choice of nbatch, it's
probably going to be a major reason for that to happen. Increasing
nbatch on the fly isn't good because it results in extra I/O for tuples
that were previously assigned to what is now the wrong batch. Again,
the only answer the patch has for this is to try not to use enough
of work_mem for it to make a difference. Seems like instead the initial
nbatch estimate needs to account for that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2009-03-06 19:14:04 libxml incompatibility
Previous Message Guillaume Smet 2009-03-06 17:58:58 Re: small parallel restore optimization