Re: Shouldn't we have a way to avoid "risky" plans?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Shouldn't we have a way to avoid "risky" plans?
Date: 2011-04-19 14:29:13
Message-ID: BANLkTimcUqk4UQEQGpocGsRoQTk-S+R3mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Mar 25, 2011 at 10:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> If the planner starts operating on the basis of worst case rather than
>>> expected-case performance, the complaints will be far more numerous than
>>> they are today.
>
>> Yeah, I don't think that's the way to go.  The other thought I had was
>> to accumulate a "risk" stat the same as we accumulate a "cost" stat.
>
>> However, I'm thinking that I'm overengineering what seems to be a fairly
>> isolated problem, in that we might simply need to adjust the costing on
>> this kind of a plan.
>
> mergejoinscansel doesn't currently try to fix up the histogram bounds by
> consulting indexes.  At the time I was afraid of the costs of doing
> that, and I still am; but it would be a way to address this issue.

Apparently, this is a pain point for the MySQL query planner - not so
much for merge joins, which I don't think are supported in any of the
major forks anyway - but the planner's desire to go estimate things by
probing the indexes. IIRC, the MariaDB guys are looking into adding
persistent statistics to address this problem. That doesn't
necessarily mean that we shouldn't do this, but it probably does mean
that we should be awfully careful about it.

Another thought is that we might want to consider reducing
autovacuum_analyze_scale_factor. The root of the original problem
seems to be that the table had some data churn but not enough to cause
an ANALYZE. Now, if the data churn is random, auto-analyzing after
10% churn might be reasonable, but a lot of data churn is non-random,
and ANALYZE is fairly cheap. I'm just shooting in the dark here; I
might be all wet. I think part of the problem is that the AV launcher
isn't very smart about looking at the overall picture. It'd be nice,
for example, to be able to be more aggressive when the system is quiet
and to be a bit more careful when the system is saturated, but it's a
bit tricky to think about how to make that work, or exactly what the
heuristics should be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-04-19 14:30:48 Re: How to configure a read-only database server?
Previous Message Robert Haas 2011-04-19 14:22:55 Re: Shouldn't we have a way to avoid "risky" plans?