From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Yet another abort-early plan disaster on 9.3 |
Date: | 2014-10-02 08:19:30 |
Message-ID: | CA+U5nMKqqPNpSr2CQ2V0=9bjWVv-OA28o3BoxYP=7YVfyswdzw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On 1 October 2014 19:56, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 09/30/2014 04:01 PM, Simon Riggs wrote:
>> On 30 September 2014 18:28, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>>>> Anyway, in the particular case I posted fixing n_distinct to realistic
>>>> numbers (%) fixed the query plan.
>>>
>>>
>>> But wouldn't fixing the absolute number also have fixed the plan?
>>
>> There are two causes of this issue.
>>
>> 1. Poor estimates of n_distinct. Fixable by user.
>>
>> 2. Poor assumption of homogeneous distribution. No way for user to
>> fix. Insufficient stats detail to be able to solve in current planner.
>>
>> I see (2) as the main source of issues, since as we observe, (1) is fixable.
>
> I disagree that (1) is not worth fixing just because we've provided
> users with an API to override the stats. It would unquestionably be
> better for us to have a better n_distinct estimate in the first place.
> Further, this is an easier problem to solve, and fixing n_distinct
> estimates would fix a large minority of currently pathological queries.
> It's like saying "hey, we don't need to fix the leak in your radiator,
> we've given you a funnel in the dashboard you can pour water into."
Having read papers on it, I believe the problem is intractable. Coding
is not the issue. To anyone: please prove me wrong, in detail, with
references so it can be coded.
> I do agree that (2) is worth fixing *as well*. In a first
> approximation, one possibility (as Tom suggests) would be to come up
> with a mathematical model for a selectivity estimate which was somewhere
> *between* homogenous distribution and the worst case. While that
> wouldn't solve a lot of cases, it would be a start towards having a
> better model.
This may have a reasonable solution, but I don't know it. A more
accurate mathematical model will still avoid the main problem: it is a
guess, not certain knowledge and the risk will still remain.
>>> I don't think correlation is up to the task as a complete solution, although
>>> it might help a little. There is no way a simple correlation can encode
>>> that John retired 15 years ago and hasn't logged on since, while Johannes
>>> was hired yesterday and never logged on before then.
>>
>> Ah, OK, essentially the same example.
>>
>> Which is why I ruled out correlation stats based approaches and
>> suggested a risk-weighted cost approach.
>
> By "risk-weighted" you mean just adjusting cost estimates based on what
> the worst case cost looks like, correct? That seemed to be your
> proposal from an earlier post. If so, we're in violent agreement here.
I proposed a clear path for this earlier in the thread and received no
comments as yet. Please look at that.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro HORIGUCHI | 2014-10-02 08:47:39 | Re: Escaping from blocked send() reprised. |
Previous Message | Peter Geoghegan | 2014-10-02 07:54:19 | Re: UPSERT wiki page, and SQL MERGE syntax |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2014-10-02 09:30:02 | Re: Yet another abort-early plan disaster on 9.3 |
Previous Message | Dorian Hoxha | 2014-10-02 07:42:35 | Re: auto vaccum is dying |