Quick Links

Re: The testing of multi-batch hash joins with skewed data sets patch

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"David Rowley" <dgrowley(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, pandasuit(at)gmail(dot)com, ramon(dot)lawrence(at)ubc(dot)ca
Subject:	Re: The testing of multi-batch hash joins with skewed data sets patch
Date:	2009-02-10 22:36:26
Message-ID:	25766.1234305386@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"David Rowley" <dgrowley(at)gmail(dot)com> writes:
> Currently I'm unsure the best way to ensure that the hash join goes into
> more than one batch apart from just making the dataset very large.

Make work_mem very small?

But really there are two different performance regimes here, one where
the hash data is large enough to spill to disk and one where it isn't.
Reducing work_mem will cause data to spill into kernel disk cache, but
if the total problem fits in RAM then very possibly that data won't ever
really go to disk. So I suspect such a test case will act more like the
small-data case than the big-data case. You probably actually need more
data than RAM to be sure you're testing the big-data case.

Regardless, I'd like to see some performance results from both regimes.
It's also important to be sure there is not a penalty for single-batch
cases.

regards, tom lane

In response to

The testing of multi-batch hash joins with skewed data sets patch at 2009-02-10 22:05:17 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2009-02-10 22:41:37	Re: Bug #4284
Previous Message	Tom Lane	2009-02-10 22:30:04	Re: Bug #4284