Re: Asymmetric partition-wise JOIN

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Aleksander Alekseev <afiskon(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, KaiGai Kohei <kaigai(at)heterodb(dot)com>
Subject: Re: Asymmetric partition-wise JOIN
Date: 2023-10-15 00:18:43
Message-ID: CAPpHfdtm2_xM1WP1=Jyj9vRQ5jNu-kSqaoazvUp_s_7Cq0906A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexander,
Hi Andrey,

Thank you for your work on this subject.

On Mon, Jan 17, 2022 at 1:42 PM Alexander Pyhalov
<a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
> The patch does not longer apply cleanly, so I rebased it. Attaching
> rebased version.

Not surprising that the patch doesn't apply after 1.5 years since the
last message. Could you please rebase it?

I read the thread and the patch. The patch improves the joining of
partitioned tables with non-partitioned relations. Let's denote
non-partitioned relation as A, partitions as P1 ... PN. The patch
allows to Append(Join(A, P1), ... Join(A, PN) instead of Join(A,
Append(P1, ... PN). That could be cheaper because it's generally
cheaper to join small pieces rather than do one big join. The
drawback is the need to scan A multiple times. But is this really
necessary and acceptable? Let's consider multiple options.

1) A is non-table. For instance, A is a function scan. In this case,
doing multiple scans of A is not just expensive, but could lead to
unexpected side effects. When the user includes a function once in
the FROM clause, she expects this function to be evaluated once. I
propose that we should materialize a scan of non-table relations. So,
materialized representation will be scanned multiple times, but the
source only scanned once. That would be similar to CTE.
2) A is the table to be scanned with the parametrized path in the
inner part of the nested loop join. In this case, there is no big
scan of A and nothing to materialize.
3) A is the table to be used in merge join or outer part of nested
loop join. In this case, it would be nice to consider materialize.
It's not always good to materialize, because materialization has its
additional costs. I think that could be a cost-based decision.
4) A is used in the hash join. Could we re-use the hashed
representation of A between multiple joins? I read upthread it was
proposed to share a hashed table between multiple background workers
via shared memory. But the first step would be to just share it
between multiple join nodes within the same process.

As we consider joining with each partition individually, there could
be chosen different join methods. As I get, the current patch
considers joining with each of the partitions as a separate isolated
optimization task. However, if we share resources between the
multiple joins, then rises a need for some global optimization. For
instance, a join type could be expensive when applied to an individual
partition, but cheap when applied to all the partitions thanks to
saving the common work.

My idea is to consider generated common resources (such as
materialized scans) as a property of the path. For instance, if the
nested loop join is cheaper than the hash join, but the hash join
generates a common hash map of table A, we don't drop hash join
immediately from the consideration and leave it to see how it could
help join other partitions. What do you think?

------
Regards,
Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-10-15 00:57:18 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Erik Wienhold 2023-10-14 23:51:05 Re: Patch: Improve Boolean Predicate JSON Path Docs