Re: Potential Join Performance Issue

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Michael Henderson <mikecubed(at)gmail(dot)com>
Subject: Re: Potential Join Performance Issue
Date: 2009-01-08 01:42:46
Message-ID: 200901080142.n081gkk19548@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Has this been completed? TODO item?

---------------------------------------------------------------------------

Lawrence, Ramon wrote:
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > I was intending to do it the other way, actually. An extra field in
> > HashPath hardly costs anything. The other reason for it is that there
> > are other possible uses for knowing whether a hash will be
> multi-batch.
> > (For example, if we were prepared to tell the executor that it *must*
> > keep the hash to one batch, we could assume that the sort order of the
> > left input is preserved. I haven't looked into the risks/benefits of
> > that too much, but it's been in the back of the mind for a long time.)
>
> Having the number of batches in HashPath could be potentially useful for
> a variety of reasons. For our research, we have added an nbatch
> variable in both HashPath and HashJoin. Having it in HashJoin is useful
> as we modified EXPLAIN to output the number of batches. There are costs
> in putting an nbatch variable in HashPath as the system may set this
> variable potentially hundreds/thousands of times during costing and does
> not (currently) use it until you convert the chosen HashPath to a plan.
>
> > I'd be more inclined to deal with the issue by trying to establish a
> > "safety margin" in the estimate of whether the hash will go
> multi-batch.
> > IOW we should disuse_physical_tlist if the hash is estimated to be
> close
> > to but still within one batch.
>
> Our experiments with large TPC-H 1GB joins show that it is almost always
> better to not use physical_tlists if the number of batches is > 1.
> There is a noticeable (approximately 5-15%) improvement when using
> physical_tlists for in-memory joins. For batches of size 2, it
> sometimes can go either way depending how many attributes are projected
> out of the outer relation. Using physical_tlists may be better even for
> batches of size 2 if most of the attributes of the outer relation are
> kept. For a larger number of batches, the extra I/O cost significantly
> dominates over the physical_tlist optimization. Performance of
> multi-batch joins may improve 50% or more by disabling the optimization.
>
> It is possible to create a "safety margin" by having
> ExecChooseHashTableSize() return the value
> inner_rel_bytes/hash_table_bytes which represents the fraction of the
> memory available that the inner relation is expected to consume. You
> can then make decisions based on that. However, this is only as good
> as the inner relation size estimate and especially for large queries,
> the estimate may be quite inaccurate. A more robust solution could
> examine the "width" of the path and the "width" of the relation combined
> with the number of batches to see if projecting early would be worth it.
> It may be best to keep it simple and just use number of batches > 1 as a
> criteria and instead focus on examining issues with inaccurate join size
> estimates.
>
> --
> Dr. Ramon Lawrence
> Assistant Professor, Department of Computer Science, University of
> British Columbia Okanagan
> E-mail: ramon(dot)lawrence(at)ubc(dot)ca
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-01-08 01:50:20 Re: text search patch status update?
Previous Message KaiGai Kohei 2009-01-08 01:26:19 Re: Updates of SE-PostgreSQL 8.4devel patches (r1389)