Re: hashjoin chosen over 1000x faster plan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: hashjoin chosen over 1000x faster plan
Date: 2007-10-10 20:32:57
Message-ID: 23650.1192048377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> The point I'm trying to make is that at planning time the
> pg_statistic row for this "Charge"."reopHistSeqNo" column showed
> stanullfrac as 0.989; it doesn't seem to have taken this into account
> when making its guess about how many rows would be joined when it was
> compared to the primary key column of the "CaseHist" table.

It certainly does take nulls into account, but the estimate of resulting
rows was still nonzero; and even if it were zero, I'd be very hesitant
to make it choose a plan that is fast only if there were exactly zero
such rows and is slow otherwise. Most of the complaints we've had about
issues of this sort involve the opposite problem, ie, the planner is
choosing a plan that works well for few rows but falls down because
reality involves many rows. "Fast-for-few-rows" plans are usually a lot
more brittle than the alternatives in terms of the penalty you pay for
too many rows, and so putting a thumb on the scales to push it towards a
"fast" corner case sounds pretty unsafe to me.

As Simon notes, the only technically sound way to handle this would
involve run-time plan changeover, which is something we're not nearly
ready to tackle.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2007-10-10 20:48:32 Re: hashjoin chosen over 1000x faster plan
Previous Message Josh Trutwin 2007-10-10 20:32:01 Re: Shared Buffer setting in postgresql.conf