Quick Links

add_path optimization

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	add_path optimization
Date:	2009-02-01 04:37:39
Message-ID:	603c8f070901312037ld88d3f4qd4821c4b44c5c3f6@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I've been doing some benchmarking and profiling on the PostgreSQL
query analyzer, and it seems that (at least for the sorts of queries
that I typically run) the dominant cost is add_path(). I've been able
to find two optimizations that seem to help significantly:

1. add_path() often calls compare_fuzzy_path_costs() twice on the same
pair of paths, and when the paths compare equal on one criterion, some
comparisons are duplicated. I've refactored this function to return
the results of both calculations without repeating any floating-point
arithmetic.

2. match_unsorted_outer() adds as many as 5 nested loop joins at a
time with the same set of pathkeys. In my tests, it tended to be ~3 -
cheapest inner, cheapest inner materialized, and cheapest inner index.
Since these all have the same pathkeys, clearly only the one with the
cheapest total cost is in the running for cheapest total cost for that
set of pathkeys, and likewise for startup cost (and the two may be the
same). Yet we compare all of them against the whole pathlist, one
after the other, including (for the most part) the rather expensive
pathkey comparison. I've added a function add_similar_paths() and
refactored match_unsorted_outer() to use it.

On a couple of complex (and proprietary) queries with 12+ joins each,
I measure a planning time improvement of 8-12% with the attached patch
applied. It would be interesting to try to replicate this on a
publicly available data set, but I don't know of a good one to use.
Suggestions welcome - results of performance testing on your own
favorite big queries even more welcome. Simple test harness also
attached. I took the approach of dropping caches, starting the
server, and then running this 5 times each on several queries,
dropping top and bottom results.

...Robert

Attachment	Content-Type	Size
fast_add_path.patch	text/x-patch	10.7 KB
explain_loop.pl	application/x-perl	585 bytes

Responses

Re: add_path optimization at 2009-02-01 17:03:43 from David Fetter
Re: add_path optimization at 2009-02-01 18:34:51 from Jaime Casanova
Re: add_path optimization at 2009-02-03 01:10:29 from Kevin Grittner
Re: add_path optimization at 2009-02-28 02:35:34 from Tom Lane
Re: add_path optimization at 2010-02-27 03:07:00 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Grzegorz Jaskiewicz	2009-02-01 04:54:08	Re: Updated backslash consistency patch
Previous Message	Hiroshi Saito	2009-02-01 03:47:40	Re: pgevent warnings on mingw