Re: Asymmetric partition-wise JOIN

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Aleksander Alekseev <afiskon(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, KaiGai Kohei <kaigai(at)heterodb(dot)com>, "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>, Белялов Дамир Наилевич <d(dot)belyalov(at)postgrespro(dot)ru>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2023-10-15 05:40:36
Message-ID: 5c0e38e3-7ab5-4b10-a1bb-70ca69771ff0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15/10/2023 07:18, Alexander Korotkov wrote:
> Hi Alexander,
> Hi Andrey,
>
> Thank you for your work on this subject.
>
> On Mon, Jan 17, 2022 at 1:42 PM Alexander Pyhalov
> <a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
>> The patch does not longer apply cleanly, so I rebased it. Attaching
>> rebased version.
>
> Not surprising that the patch doesn't apply after 1.5 years since the
> last message. Could you please rebase it?
>
> I read the thread and the patch. The patch improves the joining of
> partitioned tables with non-partitioned relations. Let's denote
> non-partitioned relation as A, partitions as P1 ... PN. The patch
> allows to Append(Join(A, P1), ... Join(A, PN) instead of Join(A,
> Append(P1, ... PN). That could be cheaper because it's generally
> cheaper to join small pieces rather than do one big join. The
> drawback is the need to scan A multiple times. But is this really
> necessary and acceptable? Let's consider multiple options.
>
> 1) A is non-table. For instance, A is a function scan. In this case,
> doing multiple scans of A is not just expensive, but could lead to
> unexpected side effects. When the user includes a function once in
> the FROM clause, she expects this function to be evaluated once. I
> propose that we should materialize a scan of non-table relations. So,
> materialized representation will be scanned multiple times, but the
> source only scanned once. That would be similar to CTE.
> 2) A is the table to be scanned with the parametrized path in the
> inner part of the nested loop join. In this case, there is no big
> scan of A and nothing to materialize.
> 3) A is the table to be used in merge join or outer part of nested
> loop join. In this case, it would be nice to consider materialize.
> It's not always good to materialize, because materialization has its
> additional costs. I think that could be a cost-based decision.
> 4) A is used in the hash join. Could we re-use the hashed
> representation of A between multiple joins? I read upthread it was
> proposed to share a hashed table between multiple background workers
> via shared memory. But the first step would be to just share it
> between multiple join nodes within the same process.
>
> As we consider joining with each partition individually, there could
> be chosen different join methods. As I get, the current patch
> considers joining with each of the partitions as a separate isolated
> optimization task. However, if we share resources between the
> multiple joins, then rises a need for some global optimization. For
> instance, a join type could be expensive when applied to an individual
> partition, but cheap when applied to all the partitions thanks to
> saving the common work.
>
> My idea is to consider generated common resources (such as
> materialized scans) as a property of the path. For instance, if the
> nested loop join is cheaper than the hash join, but the hash join
> generates a common hash map of table A, we don't drop hash join
> immediately from the consideration and leave it to see how it could
> help join other partitions. What do you think?

Thanks for such detailed feedback!
The rationale for this patch was to give the optimizer additional ways
to push down more joins into foreign servers. And, because of
asynchronous append, the benefit of that optimization was obvious.
Unfortunately, we hadn't found other applications for this feature,
which was why this patch was postponed in the core.
You have brought new ideas about applying this idea locally. Moreover,
the main issue of the patch was massive memory consumption in the case
of many joins and partitions - because of reparameterization. But now,
postponing the reparameterization proposed in the thread [1] resolves
that problem and gives some insights into the reparameterization
technique of some fields, like lateral references.
Hence, I think we can restart this work.
The first thing here (after rebase, of course) is to figure out and
implement in the cost model cases of effectiveness when asymmetric join
would give significant performance.

[1] Oversight in reparameterize_path_by_child leading to executor crash
https://www.postgresql.org/message-id/flat/CAMbWs496%2BN%3DUAjOc%3DrcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2023-10-15 06:00:00 Re: A failure in 031_recovery_conflict.pl on Debian/s390x
Previous Message Alexander Korotkov 2023-10-15 00:57:18 Re: [HACKERS] make async slave to wait for lsn to be replayed