Re: add_path optimization

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: add_path optimization
Date: 2010-02-27 03:07:00
Message-ID: 201002270307.o1R370d14100@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Did this ever get applied/resolved?

---------------------------------------------------------------------------

Robert Haas wrote:
> I've been doing some benchmarking and profiling on the PostgreSQL
> query analyzer, and it seems that (at least for the sorts of queries
> that I typically run) the dominant cost is add_path(). I've been able
> to find two optimizations that seem to help significantly:
>
> 1. add_path() often calls compare_fuzzy_path_costs() twice on the same
> pair of paths, and when the paths compare equal on one criterion, some
> comparisons are duplicated. I've refactored this function to return
> the results of both calculations without repeating any floating-point
> arithmetic.
>
> 2. match_unsorted_outer() adds as many as 5 nested loop joins at a
> time with the same set of pathkeys. In my tests, it tended to be ~3 -
> cheapest inner, cheapest inner materialized, and cheapest inner index.
> Since these all have the same pathkeys, clearly only the one with the
> cheapest total cost is in the running for cheapest total cost for that
> set of pathkeys, and likewise for startup cost (and the two may be the
> same). Yet we compare all of them against the whole pathlist, one
> after the other, including (for the most part) the rather expensive
> pathkey comparison. I've added a function add_similar_paths() and
> refactored match_unsorted_outer() to use it.
>
> On a couple of complex (and proprietary) queries with 12+ joins each,
> I measure a planning time improvement of 8-12% with the attached patch
> applied. It would be interesting to try to replicate this on a
> publicly available data set, but I don't know of a good one to use.
> Suggestions welcome - results of performance testing on your own
> favorite big queries even more welcome. Simple test harness also
> attached. I took the approach of dropping caches, starting the
> server, and then running this 5 times each on several queries,
> dropping top and bottom results.
>
> ...Robert

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2010-02-27 03:09:26 Re: Testing of parallel restore with current snapshot
Previous Message Tom Lane 2010-02-27 03:00:09 Re: Was partitioned table row estimation fixed in 9.0?