Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

From: Dave Nicponski <dave(at)seamlessdocs(dot)com>
To: Gunther <raj(at)gusw(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices
Date: 2017-11-03 15:56:45
Message-ID: CACuOpDbn4ztuGxmckim8QwavxmpOuF6X+iR-HWu4F7PawEEP2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you Gunther for bringing this up. It's been bothering me quite a bit
over time as well.

Forgive the naive question, but does the query planner's cost estimator
only track a single estimate of cost that gets accumulated and compared
across plan variants? Or is it keeping a range or probabilistic
distribution? I'm suspecting the former, but i bet either of the latter
would fix this rapidly.

The cases that frustrate me are where NL is chosen over something like HJ,
where if the query planner is slightly wrong on the lower side, then NL
would certainly beat HJ (but by relatively small amounts), but a slight
error on the higher side mean that the NL gets punished tremendously, do to
the big-o penalty difference it's paying over the HJ approach. Having the
planner with some notion of the distribution might help it make a better
assessment of the potential consequences for being slightly off in its
estimates. If it notices that being off on a plan involving a NL sends the
distribution off into hours instead of seconds, it could potentially avoid
it even if it might be slightly faster in the mean.

<fantasy> If i ever find time, maybe i'll try to play around with this idea
and see how it performs... </fantasy>

-dave-

On Fri, Nov 3, 2017 at 11:13 AM, Gunther <raj(at)gusw(dot)net> wrote:

> On 11/3/2017 10:55, legrand legrand wrote:
>
>> To limit NL usage, wouldn't a modified set of Planner Cost Constants
>> https://www.postgresql.org/docs/current/static/runtime-config-query.html
>> <https://www.postgresql.org/docs/current/static/runtime-config-query.html
>> >
>>
>> seq_page_cost
>> random_page_cost
>> cpu_tuple_cost
>> cpu_index_tuple_cost
>> cpu_operator_cost
>>
>> be more hash join freindly (as Oracle' optimizer_index_cost_adj )?
>>
>> I twiddled with some of these and could nudge it toward a Sort Merge
> instead NL. But it's hit or miss.
>
> May be there should be a tool which you can run periodically which will
> test out the installation to see how IO, CPU, and memory performs. Or,
> again, these statistics should be collected during normal operation so that
> nobody needs to guess them or test them in complex procedures. As the
> system runs, it should sample the seq_page_cost and random_page_cost
> (noticing that it has a SSD or HDD) and it should see how much disk read is
> from cache and how much goes out to disk. Why isn't the executor of queries
> the best person to ask for these cost constants?
>
> regards,
> -Gunther
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--

Dave Nicponski

Chief Technology Officer

917.696.3081

|

dave(at)seamlessdocs(dot)com

30 Vandam Street. 2nd Floor. NYC
855.77.SEAMLESS | SeamlessGov.com <http://seamlessgov.com>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message 刘瑞 2017-11-04 15:20:42 Unnecessary DISTINCT while primary key in SQL
Previous Message Gunther 2017-11-03 15:13:35 Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices