Re: How to retain lesser paths at add_path()?

From: Kohei KaiGai <kaigai(at)heterodb(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <riguo(at)pivotal(dot)io>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to retain lesser paths at add_path()?
Date: 2021-03-06 09:50:13
Message-ID: CAOP8fzYBtdS5rLf8z+PdeEpkxP7k9XiQJ9GVOQd2gxRUhSn5sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2020年11月6日(金) 0:40 Dmitry Dolgov <9erthalion6(at)gmail(dot)com>:
>
> > On Tue, Jan 14, 2020 at 12:46:02AM +0900, Kohei KaiGai wrote:
> > The v2 patch is attached.
> >
> > This adds two dedicated lists on the RelOptInfo to preserve lesser paths
> > if extension required to retain the path-node to be removed in usual manner.
> > These lesser paths are kept in the separated list, so it never expand the length
> > of pathlist and partial_pathlist. That was the arguable point in the discussion
> > at the last October.
> >
> > The new hook is called just before the path-node removal operation, and
> > gives extension a chance for extra decision.
> > If extension considers the path-node to be removed can be used in the upper
> > path construction stage, they can return 'true' as a signal to preserve this
> > lesser path-node.
> > In case when same kind of path-node already exists in the preserved_pathlist
> > and the supplied lesser path-node is cheaper than the old one, extension can
> > remove the worse one arbitrarily to keep the length of preserved_pathlist.
> > (E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved-
> > pathlist for further opportunity of combined usage with GpuPreAgg path-node.
> > It just needs "the best GpuJoin path-node" somewhere, not two or more.)
> >
> > Because PostgreSQL core has no information which preserved path-node can
> > be removed, extensions that uses path_removal_decision_hook() has responsibility
> > to keep the length of preserved_(partial_)pathlist reasonable.
>
> Hi,
>
> Thanks for the patch! I had a quick look at it and have a few questions:
>
Sorry for the very late response. It's my oversight.

> * What would be the exact point/hook at which an extension can use
> preserved pathlists? I guess it's important, since I can imagine it's
> important for one of the issues mentioned in the thread about such an
> extension have to re-do significant part of the calculations from
> add_path.
>
set_join_pathlist_hook and create_upper_paths_hook

For example, even if GpuPreAgg may be able to generate cheaper path
with GpuJoin result, make_one_rel() may drop GpuJoin results due to
its own cost estimation. In this case, if lesser GpuJoin path would be
preserved somewhere, the extension invoked by create_upper_paths_hook
can make GpuPreAgg path with GpuJoin sub-path; that can reduce
data transfer between CPU and GPU.

> * Do you have any benchmark results with some extension using this
> hook? The idea with another pathlist of "discarded" paths sounds like
> a lightweight solution, and indeed I've performed few tests with two
> workloads (simple queries, queries with joins of 10 tables) and the
> difference between master and patched versions is rather small (no
> stable difference for the former, couple of percent for the latter).
> But it's of course with an empty hook, so it would be good to see
> other benchmarks as well.
>
Not yet. And, an empty hook will not affect so much.

Even if the extension uses the hook, it shall be more lightweight than
its own alternative implementation. In case of PG-Strom, it also saves
Gpu-related paths in its own hash-table, then we look at the hash-table
also to find out the opportunity to merge multiple GPU invocations into
single invocation.

> * Does it make sense to something similar with add_path_precheck,
> which also in some situations excluding paths?
>
This logic allows to skip the paths creation that will obviously have
expensive cost. Its decision is based on the cost estimation.

The path_removal_decision_hook also gives extensions a chance to
preserve pre-built paths that can be used later even if cost is not best.
This decision is not only based on the cost. In my expectations, it allows
to preserve the best path in the gpu related ones.

> * This part sounds dangerous for me:
>
> > Because PostgreSQL core has no information which preserved path-node can
> > be removed, extensions that uses path_removal_decision_hook() has responsibility
> > to keep the length of preserved_(partial_)pathlist reasonable.
>
> since an extension can keep limited number of paths in the list, but
> then the same hook could be reused by another extension which will
> also try to limit such paths, but together they'll explode.
>
If Path node has a flag to indicate whether it is referenced by any other
upper node, we can simplify the check whether it is safe to release.
In the current implementation, the lesser paths except for IndexPath are
released at the end of add_path.

On the other hand, I'm uncertain whether the pfree(new_path) at the tail
of add_path makes sense on the modern hardware, because they allow to
recycle just small amount of memory, then entire memory consumption
by the optimizer shall be released by MemoryContext mechanism.
If add_path does not release path-node, the above portion is much simpler.

Best regards,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai(at)heterodb(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-06 10:19:05 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message 杨逸存 2021-03-06 09:01:55 Inquiries about PostgreSQL's system catalog development——from a student developer of Nanjing University