Re: Potential Join Performance Issue

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Potential Join Performance Issue
Date: 2008-09-10 01:46:57
Message-ID: 12937.1221011217@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca> writes:
> Our research group has been using the PostgreSQL code base to test new
> join algorithms. During testing, we noticed that the planner is not
> pushing down projections to the outer relation in a hash join. Although
> this makes sense for in-memory (1 batch) joins, for joins larger than
> memory (such as for TPC-H DSS), this causes the system to perform
> significantly more disk I/Os when reading/writing batches of the outer
> relation.

Hm. The proposed patch seems a bit brute-force, since it loses the
benefit of the physical-tlist optimization even if the relations are
certainly too small to require batching.

> A more complicated modification alternative is to add a state variable
> to allow the planner to know how many batches the hash join expects and
> only push down the projection if it is greater than one. However,
> pushing the projection on the outer relation is almost always the best
> choice as it eliminates unneeded attributes for operators above the hash
> join in the plan and will be robust in the case of poor estimates.

Nonetheless, I'm inclined to do it that way. The "robust in the case of
poor estimates" argument doesn't convince me, because the incremental
cost isn't *that* large if we get it wrong; and the other argument is
just bogus because we don't do physical tlists at or above joins anyhow.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2008-09-10 02:06:31 Re: [PATCH] Cleanup of GUC units code
Previous Message Alex Hunsaker 2008-09-10 01:41:40 Re: [PATCHES] to_date() validation