From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Yet another abort-early plan disaster on 9.3 |
Date: | 2014-09-30 17:28:02 |
Message-ID: | CAMkU=1wCMLWZexqVUPXWFeKmPPUDMv6GEEFUVDQnb7gfW9Wamw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Mon, Sep 29, 2014 at 2:54 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 09/26/2014 01:06 AM, Simon Riggs wrote:
> > On 23 September 2014 00:56, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >
> >> We've hashed that out a bit, but frankly I think it's much more
> >> profitable to pursue fixing the actual problem than providing a
> >> workaround like "risk", such as:
> >>
> >> a) fixing n_distinct estimation
> >> b) estimating stacked quals using better math (i.e. not assuming total
> >> randomness)
> >> c) developing some kind of correlation stats
> >>
> >> Otherwise we would be just providing users with another knob there's no
> >> rational way to set.
> >
> > I believe this is a serious issue for PostgreSQL users and one that
> > needs to be addressed.
> >
> > n_distinct can be fixed manually, so that is less of an issue.
>
> It's an issue for the 99.8% of our users who don't know what n_distinct
> is, let alone how to calculate it. Also, changing it requires an
> exclusive lock on the table. Of course, you and I have been over this
> issue before.
>
If 99.6% of our users don't have a problem with n_distinct in their system,
that would mean that only 50% of the people with the problem don't know how
to solve it. And those people can usually get excellent free help on the
internet.
But if the problem not with n_distinct, but rather with most_common_freqs
(which I encounter more often than problems with n_distinct), all I can do
is shrug and say "yeah I know about that problem. Either crank up
statistics target as high as it will go, or it sucks to be you."
>
> One thing I'm wondering is why our estimator is creates n_distinct as a
> % so seldom. Really, any time n_distinct is over 10K we should be
> estimating a % instead. Now, estimating that % has its own issues, but
> it does seem like a peculiar quirk of our stats model.
>
> Anyway, in the particular case I posted fixing n_distinct to realistic
> numbers (%) fixed the query plan.
>
But wouldn't fixing the absolute number also have fixed the plan? If you
are going to set a number manually and then nail it in place so that
analyze stops changing it, then I can certainly see how the fractional
method is desirable. But if the goal is not to do that but have the
correct value estimated in the first place, I don't really see much benefit
from converting the estimate into a fraction and then back again.
> >
> > The problem, as I see it, is different. We assume that if there are
> > 100 distinct values and you use LIMIT 1 that you would only need to
> > scan 1% of rows. We assume that the data is arranged in the table in a
> > very homogenous layout. When data is not, and it seldom is, we get
> > problems.
> >
> > Simply put, assuming that LIMIT will reduce the size of all scans is
> > just way wrong. I've seen many plans where increasing the LIMIT
> > dramatically improves the plan.
> >
> > If we can at least agree it is a problem, we can try to move forwards.
>
I don't think anyone doubts there is a problem (many more than one of
them), there is just disagreement about the priority and what can be done
about it.
>
> That is certainly another problem. Does correlation stat figure in the
> LIMIT calculation at all, currently? That's what correlation stat is
> for, no?
>
I don't think correlation is up to the task as a complete solution,
although it might help a little. There is no way a simple correlation can
encode that John retired 15 years ago and hasn't logged on since, while
Johannes was hired yesterday and never logged on before then.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-09-30 17:32:12 | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Previous Message | Josh Berkus | 2014-09-30 17:17:52 | Re: open items for 9.4 |
From | Date | Subject | |
---|---|---|---|
Next Message | Gavin Flower | 2014-09-30 21:11:20 | Re: Yet another abort-early plan disaster on 9.3 |
Previous Message | Graeme B. Bell | 2014-09-30 17:07:09 | Re: Yet another abort-early plan disaster on 9.3 |