Re: Avoiding bad prepared-statement plans.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 22:40:06
Message-ID: 603c8f071002261440k15ac3cddjc0a617ceb33fd3e8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 12:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Feb 26, 2010 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Also, I think there is a lot of confusion here over two different
>>> issues: generic plan versus parameter-specific plan, and bad planner
>>> estimates leading to a wrong plan choice.  While the latter is certainly
>>> an issue sometimes, there is no reason to believe that it affects
>>> prepared statements worse than non-prepared ones.  So I think that
>>> designing a fix for prepared statements on the assumption that you can't
>>> trust the planner's estimates is solving the wrong problem.
>
>> The two issues seem intimately connected to me.  Of course, estimates
>> can be inaccurate for many reasons, but ONE of those reasons is that
>> the planner is optimizing for an unknown parameter value which has
>> very different distribution from the actually supplied value.
>
> Right, but if the parameter is unknown then its distribution is also
> unknown.

I must be losing my mind. The distribution of the parameter will be
known at execution time because a specific value will be supplied. Of
course, it's not known at prepare time.

> In any case that's just nitpicking, because the solution is
> to create a custom plan for the specific value supplied.  Or are you
> suggesting that we should create a way for users to say "here is the
> expected distribution of this parameter", and then try to fold that into
> the planner estimates? I think the uptake on that would be about nil;
> not to mention that it fails to fix several of the most pressing
> problems, such as LIKE indexing and partition selection, where all the
> distribution information in the world doesn't help.  You need the exact
> value.

No, I'm not suggesting that.

Basically, what I really want here is some kind of keyword or other
syntax that I can stick into a PL/pgsql query that requests a replan
on every execution. It's easy to identify the cases where this is
needed: they are precisely the cases where my function runs for a
small eternity. I realize that you can use EXECUTE for this, but as
you pointed out upthread, that's not without its share of pain.
Exposing the same functionality via other interfaces (e.g. PQprepare,
other PLs) would be nice too, but for me personally, PL/pgsql is the
one that keeps biting me in the rear end.

One particularly ugly misadventure in this area had me write a
function which updated a bunch of data in user tables either for one
particular widget, if a widget id was passed in as an argument, or for
all widgets, if NULL was passed in. I did this by writing ...some big
hairy query... WHERE ... various conditions ... AND (id = $1 OR $1 IS
NULL). The point was that sometimes the caller knew that only a
particular widget needed to have its stuff recomputed, and sometimes
it didn't know for sure what might need updating so it just wanted to
update everything. Of course it turned out that this didn't really
work: the same plan was used in both cases, so the version with an id
specified took just as long to run as the generic version.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-02-26 22:52:50 Was partitioned table row estimation fixed in 9.0?
Previous Message Bruce Momjian 2010-02-26 22:20:37 Re: Hot Standby query cancellation and Streaming Replication integration