Re: Unwanted expression simplification in PG12b2

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Unwanted expression simplification in PG12b2
Date: 2019-09-22 11:47:17
Message-ID: CAC8Q8tK6RD7cHEsyEeqgKmiA4cWd3h8dKWv078=9As-FOaGirw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Sep 20, 2019 at 11:14 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Jul 17, 2019 at 5:20 PM Darafei "Komяpa" Praliaskouski
> <me(at)komzpa(dot)net> wrote:
> > Indeed, it seems I failed to minimize my example.
> >
> > Here is the actual one, on 90GB table with 16M rows:
> > https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c
> >
> > I can share the table on request if needed, but hope that plan may be
> enough.
>
> What I taught the planner to do here had to do with making the costing
> more accurate for cases like this. It now figures out that if it's
> going to stick a Gather in at that point, computing the expressions
> below the Gather rather than above the Gather makes a difference to
> the cost of that plan vs. other plans. However, it still doesn't
> consider any more paths than it did before; it just costs them more
> accurately. In your first example, I believe that the planner should
> be able to consider both GroupAggregate -> Gather Merge -> Sort ->
> Parallel Seq Scan and GroupAggregate -> Sort -> Gather -> Parallel Seq
> Scan, but I think it's got a fixed idea about which fields should be
> fed into the Sort. In particular, I believe it thinks that sorting
> more data is so undesirable that it doesn't want to carry any
> unnecessary baggage through the Sort for any reason. To solve this
> problem, I think it would need to cost the second plan with projection
> done both before the Sort and after the Sort and decide which one was
> cheaper.
>
> This class of problem is somewhat annoying in that the extra planner
> cycles and complexity to deal with getting this right would be useless
> for many queries, but at the same time, there are a few cases where it
> can win big. I don't know what to do about that.
>

A heuristic I believe should help my case (and I hardly imagine how it can
break others) is that in presence of Gather, all the function calls that
are parallel safe should be pushed into it.
In a perfect future this query shouldn't even have a subquery that I have
extracted for the sake of OFFSET 0 demo. Probably as a single loop that in
case of presence of a Gather tries to push down all the inner part of the
nested functions call that is Parallel Safe.
If we go as far as starting more workers, it really makes sense to load
them with actual work and not only wait for the master process.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2019-09-22 12:33:22 Re: Allow to_date() and to_timestamp() to accept localized names
Previous Message Juan José Santamaría Flecha 2019-09-22 11:15:38 Re: Wrong results using initcap() with non normalized string