Re: benchmarking the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 02:12:50
Message-ID: 15327.1229047970@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> It might be best to stop when the frequency drops below some threshold,
>> rather than taking a fixed number of entries.

> OK, I'll bite. How do we decide where to put the cutoff? If we make
> it too high, it will have a negative effect on join selectivity
> estimates; if it's too low, it won't really address the problem we're
> trying to fix. I randomly propose p = 0.001, which should limit
> eqjoinsel() to about a million equality tests in the worst case. In
> the synthetic example we were just benchmarking, that causes the
> entire MCV array to be tossed out the window (which feels about
> right).

Yeah. One idle thought I had was that maybe the cutoff needs to
consider both probabilities: if the high-frequency MCVs on one side
chance to match to not-so-high-frequency MCVs on the other, you
would like to know about that. As long as we keep the lists in
frequency order, it seems easy to implement this: for each MCV
examined by the outer loop, you run the inner loop until the product of
the outer and current inner frequency drops below whatever your
threshold is. This doesn't immediately suggest what the threshold
ought to be, but it seems like it ought to be possible to determine
that given a desired maximum error in the overall estimate. I'm also
not very clear on what the "total frequency" computations (matchfreq2
and unmatchfreq2 in the current code) ought to look like if we are using
a variable subset of the inner list.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2008-12-12 02:23:33 Re: benchmarking the query planner
Previous Message KaiGai Kohei 2008-12-12 02:04:00 Re: Updates of SE-PostgreSQL 8.4devel patches (r1268)