| From: | Bruce Momjian <bruce(at)momjian(dot)us> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: add_path optimization | 
| Date: | 2010-02-27 03:07:00 | 
| Message-ID: | 201002270307.o1R370d14100@momjian.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Did this ever get applied/resolved?
---------------------------------------------------------------------------
Robert Haas wrote:
> I've been doing some benchmarking and profiling on the PostgreSQL
> query analyzer, and it seems that (at least for the sorts of queries
> that I typically run) the dominant cost is add_path().  I've been able
> to find two optimizations that seem to help significantly:
> 
> 1. add_path() often calls compare_fuzzy_path_costs() twice on the same
> pair of paths, and when the paths compare equal on one criterion, some
> comparisons are duplicated.  I've refactored this function to return
> the results of both calculations without repeating any floating-point
> arithmetic.
> 
> 2. match_unsorted_outer() adds as many as 5 nested loop joins at a
> time with the same set of pathkeys.  In my tests, it tended to be ~3 -
> cheapest inner, cheapest inner materialized, and cheapest inner index.
>  Since these all have the same pathkeys, clearly only the one with the
> cheapest total cost is in the running for cheapest total cost for that
> set of pathkeys, and likewise for startup cost (and the two may be the
> same).  Yet we compare all of them against the whole pathlist, one
> after the other, including (for the most part) the rather expensive
> pathkey comparison.  I've added a function add_similar_paths() and
> refactored match_unsorted_outer() to use it.
> 
> On a couple of complex (and proprietary) queries with 12+ joins each,
> I measure a planning time improvement of 8-12% with the attached patch
> applied.  It would be interesting to try to replicate this on a
> publicly available data set, but I don't know of a good one to use.
> Suggestions welcome - results of performance testing on your own
> favorite big queries even more welcome.  Simple test harness also
> attached.  I took the approach of dropping caches, starting the
> server, and then running this 5 times each on several queries,
> dropping top and bottom results.
> 
> ...Robert
[ Attachment, skipping... ]
[ Attachment, skipping... ]
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2010-02-27 03:09:26 | Re: Testing of parallel restore with current snapshot | 
| Previous Message | Tom Lane | 2010-02-27 03:00:09 | Re: Was partitioned table row estimation fixed in 9.0? |