Skip site navigation (1) Skip section navigation (2)

Re: add_path optimization

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: add_path optimization
Date: 2010-02-27 03:07:00
Message-ID: 201002270307.o1R370d14100@momjian.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Did this ever get applied/resolved?

---------------------------------------------------------------------------

Robert Haas wrote:
> I've been doing some benchmarking and profiling on the PostgreSQL
> query analyzer, and it seems that (at least for the sorts of queries
> that I typically run) the dominant cost is add_path().  I've been able
> to find two optimizations that seem to help significantly:
> 
> 1. add_path() often calls compare_fuzzy_path_costs() twice on the same
> pair of paths, and when the paths compare equal on one criterion, some
> comparisons are duplicated.  I've refactored this function to return
> the results of both calculations without repeating any floating-point
> arithmetic.
> 
> 2. match_unsorted_outer() adds as many as 5 nested loop joins at a
> time with the same set of pathkeys.  In my tests, it tended to be ~3 -
> cheapest inner, cheapest inner materialized, and cheapest inner index.
>  Since these all have the same pathkeys, clearly only the one with the
> cheapest total cost is in the running for cheapest total cost for that
> set of pathkeys, and likewise for startup cost (and the two may be the
> same).  Yet we compare all of them against the whole pathlist, one
> after the other, including (for the most part) the rather expensive
> pathkey comparison.  I've added a function add_similar_paths() and
> refactored match_unsorted_outer() to use it.
> 
> On a couple of complex (and proprietary) queries with 12+ joins each,
> I measure a planning time improvement of 8-12% with the attached patch
> applied.  It would be interesting to try to replicate this on a
> publicly available data set, but I don't know of a good one to use.
> Suggestions welcome - results of performance testing on your own
> favorite big queries even more welcome.  Simple test harness also
> attached.  I took the approach of dropping caches, starting the
> server, and then running this 5 times each on several queries,
> dropping top and bottom results.
> 
> ...Robert

[ Attachment, skipping... ]

[ Attachment, skipping... ]

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2010-02-27 03:09:26
Subject: Re: Testing of parallel restore with current snapshot
Previous:From: Tom LaneDate: 2010-02-27 03:00:09
Subject: Re: Was partitioned table row estimation fixed in 9.0?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group