Re: Hash Join performance

From: Vamsidhar Thummala <vamsi(at)cs(dot)duke(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash Join performance
Date: 2009-03-14 02:10:43
Message-ID: e0e3da5e0903131910q53a387eei66300aa2f7481d9f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Mar 13, 2009 at 7:08 PM, Tom Lane wrote:

> Vamsidhar Thummala writes:
> > I am wondering why are we subtracting the entire Seq Scan time of
> Lineitem
> > from the total time to calculate the HashJoin time.
>
> Well, if you're trying to identify the speed of the join itself and not
> how long it takes to provide the input for it, that seems like a
> sensible calculation to make.

I am still not clear on this. I am thinking the output is produced in a
pipelined fashion i.e., as soon as the record of outer child is read
(sequentially here) and if HashJoin finds a match by probing the inner hash
table (in memory), we have an output record. Please correct if I am wrong
here.

>
>
> > Here is another plan I have for the same TPC-H 18 query with different
> > configuration parameters (shared_buffers set to 400MB, just for
> experimental
> > purposes) and HashJoin seems to take longer time (at least 155.58s based
> on
> > above calculation):
>
> Yeah, that seems to work out to about 25us per row instead of 3us, which
> is a lot slower. Maybe the hash got split up into multiple batches ...
> what have you got work_mem set to? Try turning on log_temp_files and
> see if it records any temp files as getting created.

Unfortunately, I am working with Postgres 8.2 which doesn't have
log_temp_files. The work_mem is still at 1MB (all other parameters were kept
constant apart from shared_buffers w.r.t previous configuration). The hash
is build on 57 records (~20kb, customer row length is 179 bytes and orders
row length is 104 bytes) produced by inner subplan and so I will be
surprised if multiple batches are created.

Thank you.

Regards,
-Vamsi

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-03-14 02:37:37 Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
Previous Message Tom Lane 2009-03-14 02:06:16 Re: 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4